How do I read binary C++ protobuf data using Python protobuf?
- by nbolton
The Python version of Google protobuf gives us only:
SerializeAsString()
Where as the C++ version gives us both:
SerializeToArray(...)
SerializeAsString()
We're writing to our C++ file in binary format, and we'd like to keep it this way. That said, is there a way of reading the binary data into Python and parsing it as if it were a string?
Is this the correct way of doing it?
binary = get_binary_data()
binary_size = get_binary_size()
string = None
for i in range(len(binary_size)):
string += i
message = new MyMessage()
message.ParseFromString(string)
Update:
Here's a new example, and a problem:
message_length = 512
file = open('foobars.bin', 'rb')
eof = False
while not eof:
data = file.read(message_length)
eof = not data
if not eof:
foo_bar = FooBar()
foo_bar.ParseFromString(data)
When we get to the foo_bar.ParseFromString(data) line, I get this error:
Exception Type: DecodeError
Exception Value: Too many bytes when decoding varint.
Update 2:
It turns out, that the padding on the binary data was throwing protobuf off; too many bytes were being sent in, as the message suggests (in this case it was referring to the padding).
This padding comes from using the C++ protobuf function, SerializeToArray on a fixed-length buffer. To eliminate this, I have used this temproary code:
message_length = 512
file = open('foobars.bin', 'rb')
eof = False
while not eof:
data = file.read(message_length)
eof = not data
string = ''
for i in range(0, len(data)):
byte = data[i]
if byte != '\xcc': # yuck!
string += data[i]
if not eof:
foo_bar = FooBar()
foo_bar.ParseFromString(string)
There is a design flaw here I think. I will re-implement my C++ code so that it writes variable length arrays to the binary file. As advised by the protobuf documentation, I will prefix each message with it's binary size so that I know how much to read when I'm opening the file with Python.