How is it that json serialization is so much faster than yaml serialization in python?
Posted
by guidoism
on Stack Overflow
See other posts from Stack Overflow
or by guidoism
Published on 2010-03-16T02:23:12Z
Indexed on
2010/03/16
17:01 UTC
Read the original article
Hit count: 377
I have code that relies heavily on yaml for cross-language serialization and while working on speeding some stuff up I noticed that yaml was insanely slow compared to other serialization methods (e.g., pickle, json).
So what really blows my mind is that json is so much faster that yaml when the output is nearly identical.
>>> import yaml, cjson; d={'foo': {'bar': 1}}
>>> yaml.dump(d, Dumper=yaml.SafeDumper)
'foo: {bar: 1}\n'
>>> cjson.encode(d)
'{"foo": {"bar": 1}}'
>>> import yaml, cjson;
>>> timeit("yaml.dump(d, Dumper=yaml.SafeDumper)", setup="import yaml; d={'foo': {'bar': 1}}", number=10000)
44.506911039352417
>>> timeit("yaml.dump(d, Dumper=yaml.CSafeDumper)", setup="import yaml; d={'foo': {'bar': 1}}", number=10000)
16.852826118469238
>>> timeit("cjson.encode(d)", setup="import cjson; d={'foo': {'bar': 1}}", number=10000)
0.073784112930297852
PyYaml's CSafeDumper and cjson are both written in C so it's not like this is a C vs Python speed issue. I've even added some random data to it to see if cjson is doing any caching, but it's still way faster than PyYaml. I realize that yaml is a superset of json, but how could the yaml serializer be 2 orders of magnitude slower with such simple input?
© Stack Overflow or respective owner