-
Serializing Data - JSON vs. Protocol Buffers
Posted on August 17th, 2009 18 commentsJSON and Protocol Buffers are two methods for serializing data, primarily used for lightweight server-client and inter-server communication. In this post we are having a look at the performance of the latest Protocol Buffers (version 2.2.0) compared to JSON, using Java on a G1 Android phone and Python with CJSON and SimpleJSON on a typical Linux system.
- JSON serializes data using UTF-8 / 16 / 32 encoding, with turn-key libraries for most programming languages. Any data structure can be serialized with JSON, although binary data (such as an image) has to be encoded with an algorithm like Base64 before.
- Protocol Buffers is an open-source project developed by Google and released under the BSD license. It uses a binary encoding which makes the serialized data a bit smaller and does not require binary data to be encoded before. The data-structures have to be described before serialization by creating a .proto file, compiling it with protoc and including the header files in the project. The libraries from Google are available for Java, Python and C++, with third party implementations for most other programming languages.
For the performance tests we imagine a news aggregator application, which requests 10 news items from a server. Each item has the following data structure (using around 2.3 kilobytes with an 48×48px image):
class NewsItem { String title; String text; String link; byte[] image; /* 48x48px */ }The app could be faster by receiving just the infos in a first request and the images in a second, but to compare the serialization speed of JSON and Protocol Buffers we use one combined response (JSON: 23.6kb with base64 encoded images, PB: 18.3kb due to binary encoding).
1. Library Size
Let’s start with having a look at the different libraries and their size in kilobytes:

Protocol Buffers isn’t the smallest library which might not matter on many systems. Using it in an Android app would often 5 to 10-fold the application size, whereas JSON is integrated in the Android stack by default.
2. Serialization Performance - Python
The following chart portrays the (de-)serialization time for one request with 10 news items, measured in 1.000.000 runs on a typical Linux system:

JSON clearly outperforms Protocol Buffers in this scenario using Python, with CJSON serializing almost 8 times faster than SimpleJSON. Adding the option optimize_for=SPEED to the .proto file increases the speed of Protocol Buffers by around 5% (~ 100us).
3. Serialization Performance - Java / Android
The next test measures the time needed to de-serialize one request of 10 items into a usable data structure on the Android platform with Java. Here we notice the ease of use of the Protocol Buffer’s data structure, since the only step required is:
ItemList itemlist = ItemList.parseFrom(buffer.toByteArray());
With JSON we have to extract the infos into a custom class and decode the image with base64:
Item[] itemlist = new Item[10]; JSONArray items = new JSONArray(new String(buffer.toByteArray())); final int max = items.length(); for (int i=0; i < max; i++){ Item item = new Item(); item.title = items.getJSONObject(i).getString("title"); item.text = items.getJSONObject(i).getString("text"); item.link = items.getJSONObject(i).getString("url"); item.image = Base64.decode(items.getJSONObject(i).getString("image")); itemlist[i] = item; }The average time needed for these operations is quite different on Android:

In this case on the Android platform, Protocol Buffers outperform JSON by a factor of three (we’ve tried this with 1, 10, 100 and 200 de-serializations, resulting in Protocol Buffers being up to 5 times faster than JSON).
Summary
The performance of Protocol Buffers depends heavily on the scenario including the platform, programming language and complexity of the data structure; performing between 15 times slower and 5 times faster than JSON. JSON was generally performing very well, especially in Python with CJSON outperforming SimpleJSON in serializing by a factor of almost 8.
On an Android system, Protocol Buffers de-serialized data around three times faster than JSON, with another major advantage being the ease of use of the data structures. You only need to call parseFrom() once to have a fully usable class, whereas JSON requires unpacking the data into a custom class. One downside of using Protocol Buffers with Android is the size of it’s Java-library, which requires almost 800 kb (whereas JSON is available in the standard Android stack).
Update: As commenters pointed out, the Python implementation of Protocol Buffers is especially slow, and this comparison would only be complete if the performance with C++ would be measured as well (which I’ll do the coming days if nobody volunteers first :)
References
18 Responses to “Serializing Data - JSON vs. Protocol Buffers”
-
Very interesting Article, thanks a lot! I would like to see if maybe the base64 decode function plays some significant difference, so, what happens if you leave the image away? same results on Android devie?
Best Regards -
Hey Dan! I’ve just tried it on my G1: the Base64 decoding takes around 5ms per image (pretty fast) — the speed without Base64 decoding is almost the same (avg. 148.5 vs. 153.3 ms). The results with C++ would be very interesting as well.
-
Marcus August 17th, 2009 at 2:04 pm
I thought the major point of protocol buffers was the size of the serialized message. So you might want to also look at that. How many bytes were each of the serialized objects?
-
Size of the serialized objects:
- JSON: 23.58 kb
- PB: 18.32 kb -
You do not gain anything from binary serialization when you serialize strings. Add int and long fields, and you’ll see more difference.
-
Olivier de Rivoyre August 17th, 2009 at 3:33 pm
Hi,
Amazing results. Two remarks:1- In your java example, you use 4 times “items.getJSONObject(i)” instead of one. Maybe this getter do some deserialize computation. (Maybe not, I don’t know)
2- You should measure the (de-)serialization time for one request with 1.000.000 news items. Sometime, serialization do not scale very well (I thing to the .Net DataSet.Load(fileName) that make me some surprise).
-
Jeff Sharkey August 17th, 2009 at 7:03 pm
It looks like you’re citing the original protocol buffer sizes. In the Android source there is a *much* lighter weight version that is optimized for mobile devices:
-
Okay, I’ve tried using “items.getJSONObject(i)” just once, which yields a performance increase of around 2%. Thus the JSON decoding takes 150ms instead of 153.
@Jeff: Thanks a lot for the info! Do you have an example implementation of the mobile PB libs? I could not figure out how to use it, since the with protoc generated header files require com.google.protobuf.Descriptors, com.google.protobuf.GeneratedMessage, etc. which are not available there.
-
Isn’t there already a binary serialisation format for data? Perhaps you should have added ASN.1 to the list, along with XML.
-
Marcin Sciesinski August 18th, 2009 at 4:38 pm
One important factor of JSON is that the data is human readable, which helps with any debugging or problem solving.
-
It is widely known that the existing Protocol Buffer Python library is extremely sub-optimal.
I am working on an implementation of Protocol Buffers (and a corresponding Python implementation) that will almost certainly win all your benchmarks. The code will be <50k, and the decoding speed will be in the neighborhood of 200MB/s.
Stay tuned:
http://wiki.github.com/haberman/upb
http://blog.reverberate.org/category/upb/ -
@Joshua: Thanks for the status update and your work on upb — I look forward to give it a try as soon as I can!
-
Hi Chris,
I’m the engineer in charge of Protocol Buffers at Google. Nice article. Just a couple comments:
1) Our Python implementation stinks, at least in terms of performance. Basically what happened is that our desire to get the C++ and Java versions out the door lead us to release the Python version in an incomplete state, thinking we’d finish it up shortly thereafter. Unfortunately, the engineers working on it promptly discovered that they didn’t have time to finish it. I’m very unhappy about this but just haven’t been able to find any suitable replacements (either inside Google or in the open source community) to work on it. In retrospect we probably should have left it out of the release or made it more obvious that it wasn’t “done”. The README.txt file does warn that the performance is very bad, though.
The good news is that I finally have time to fix this situation myself so hopefully we’ll see some improvements in 2.3.0.
2) If you’re worried about library size, you should probably be using the new “lite” mode introduced in version 2.2.0. First, add this to your .proto file:
option optimize_for = LITE_RUNTIME;
In C++, link against libprotobuf-lite.so instead of libprotobuf.so, and in Java use libprotobuf-lite.jar instead of libprotobuf.jar. See java/README.txt for instructions on building the lite jar (the C++ one is built automatically).
-
Oh, and I’d love to see you add the C++ numbers in there. The C++ implementation is, if I do say so myself, ridiculously fast. Make sure to compile with NDEBUG defined (I’ve noticed Unix users often forget this), reuse the same message object for every parse if possible (to avoid re-allocating memory), and perhaps use tcmalloc (Google’s fast malloc implementation).
-
Hi Kenton,
Thanks a lot for dropping by and posting your replies! I will dig into it and add the numbers for the lite version in Java as well as for the C++ implementation. Should be finished, at latest, during the weekend. I really look forward to see Protocol Buffers performance with C++!
When reading the Java README, I didn’t figure the lite version reduces the size, as there is no info what the lite version actually does. Also I didn’t want to use Maven for no apparent reason — I am going to use it for the update though.
-
The readme doesn’t explain what it is — you need to look at the online documentation for the optimize_for option.
I’m not really sure where best to document this feature. Obviously we can’t describe every protobuf feature in the readme, but the current online documentation for the lite library is easy to miss. Maybe I should add a page about performance tuning…
-
Afaik Protocol Buffers, in theory is much faster than JSON, because there is no escaping/unescaping. The size of elements are much easier to be determined in Protocol Buffers, for example, strings have an index to the end, instead of a terminator in JSON, other items such as fixed ints have fixed sizes, etc.
“One important factor of JSON is that the data is human readable, which helps with any debugging or problem solving.”
That’s only true if you don’t have tools to decode protocol buffers. There are plugins for wireshark, and in the Python protobuf library you can just print repr(buffer). I guess maybe if part of the protocol buffer was corrupt it would be harder to understand it, other than that I don’t see how any protocol written in plaintext is any easier to debug. After all, plaintext is just an abstraction over binary which also must be decoded, using a text editor/viewer that is set to the right encoding.
-
>That’s only true if you don’t have tools to decode protocol buffers.
Right, thats what human readable means. As in, don’t need any tools to decode it.
What I like about protobufs is that I can re-assemble them out-of-order. JSON doesn’t like that so much..
- cell phone unlock code
- Skin pics: Tattoo download - tribals ...
- Data Recovery Services
- Mouli Cohen
- cheap web hosting
- ReputationDefender.com
- Canon Ink Cartridges
- Server Partition Software