Serialization

Although the above definition of a protocol format is simple, there's a lot covered. The first line explicitly states that I'm using proto3 instead of the assumed default proto2 that is currently used when this is not explicitly specified. The two lines beginning with option are only of interest when using this protocol format to generate Java code and they indicate the name of the outermost class and the package of that outermost class that will be generated for use by Java applications to work with this protocol format.

The 'message' keyword indicates that this structure, named 'Album' here, is what needs to be represented. There are four fields in this construct with three of them being string format and one being an integer (int32). Two of the four fields can exist more than once in a given message because they are annotated with the repeated reserved word. Note that I created this definition without considering Java except for the two options that specify details of generation of Java classes from this format specification.

The album.proto file shown above now needs to be 'compiled' into the Java source class file (AlbumProtos.java in the dustin.examples.protobuf package) that will allow for writing and reading Protocol Buffers's binary format that corresponds to the defined protocol format. This generation of Java source code file is accomplished using the protoc compiler that is included in the appropriate operating system-based archive file. In my case, because I'm running this example in Windows 10, I downloaded and unzipped protoc-3.5.1-win32.zip to get access to this protoc tool. The next image depicts my running protoc against album.proto with the command protoc --proto_path=src --java_out=distgenerated album.proto.

For running the above, I had my album.proto file in the src directory pointed to by --proto_path and I had a created (but empty) directory called buildgenerated for the generated Java source code to be placed in as specified by --java_out flag.

The generated class's Java source code file AlbumProtos.java in the specified package has more than 1000 lines and I won't list that generated class source code here, but it's available on GitHub. Among the several interesting things to note about this generated code is the lack of import statements (fully qualified package names used instead for all class references). More details regarding the Java source code generated by protoc is available in the Java Generated Code guide. It's important to note that this generated class AlbumProtos has still not been influenced by any of my own Java application code and is solely generated from the album.proto text file shown earlier in the post.

With the generated Java source code available for AlbumProtos, I now add the directory in which this class was generated to my IDE's source path because I'm treating it as a source code file now. I could have alternatively compiled it into a .class or .jar to use as a library. With this generated Java source code file now in my source path, I can build it alongside my own code.

Before going further in this example, we need a simple Java class to represent with Protocol Buffers. For this, I'll use the class Album that is defined in the next code listing (also available on GitHub).

Album.java


Java protobuf to json

With a Java 'data' class defined (Album) and with a Protocol Buffers-generated Java class available for representing this album (AlbumProtos.java), I'm ready to write Java application code to 'serialize' the album information without using Java serialization. This application (demonstration) code resides in the AlbumDemo class which is available on GitHub and from which I'll highlight relevant portions of in this post.

We need to generate a sample instance of Album to use in this example and this is accomplished with the next hard-coded listing.

Generating Sample Instance of Album


The Protocol Buffers generated class AlbumProtos includes a nested AlbumProtos.Album class that I'll be using to store the contents of my Album instance in binary form. The next code listing demonstrates how this is done.

Instantiating AlbumProtos.Album from Album


As the previous code listing demonstrates, a 'builder' is used to populate the immutable instance of the class generated by Protocol Buffers. With a reference to this instance, I can now easily write the contents of the instance out in Protocol Buffers' binary form using the method toByteArray() on that instance as shown in the next code listing.

Writing Binary Form of AlbumProtos.Album


Reading a byte[] array back into an instance of Album can be accomplished as shown in the next code listing.

Instantiating Album from Binary Form of AlbumProtos.Album


As indicated in the last code listing, a checked exception InvalidProtocolBufferException can be thrown during the invocation of the static method parseFrom(byte[]) defined in the generated class. Obtaining a 'deserialized' instance of the generated class is essentially a single line and the rest of the lines are getting data out of the instantiation of the generated class and setting that data in the original Album class's instance.

The demonstration class includes two lines that print out the contents of the original Album instance and the instance ultimately retrieved from the binary representation. These two lines include invocations of System.identityHashCode() on the two instances to prove that they are not the same instance even though their contents match. When this code is executed with the hard-coded Album instance details shown earlier, the output looks like this:


From this output, we see that the relevant fields are the same in both instances and that the two instances truly are unique. This is a bit more work than using Java's 'nearly automatic' Serialization mechanism implementing the Serializable interface, but there are important advantages associated with this approach that can justify the cost. In Effective Java, Third Edition, Josh Bloch discusses the security vulnerabilities associated with deserialization in Java's default mechanism and asserts that 'There is no reason to use Java serialization in any new system you write.'

Like This Article? Read More From DZone

java ,serialization ,protocol buffers ,effective java ,tutorial
Published at DZone with permission of Dustin Marx , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Active1 year, 4 months ago
I am receiving protobuf messages on kafka, the consumer is configured to deserialize the events using
If I use parseFrom(byte[] data) method of com.google.protobuf.Parser by passing byte array of the deserialized event string, the method throws following exception:
If I instead deserialize kafka events with
and directly pass the byte array thus received to parseFrom, protobuf is correctly parsed without any exception.
Why does the second way work, but the first does not?
user87407
user87407user87407
3151 gold badge4 silver badges16 bronze badges

1 Answer

Java Protobuf Serialization List

You are using a String deserializer, which expects certain special characters to define the message's limits. It tries to deserialize a STRING but he receives just a bunch of bytes with a format the consumer doesn't expect at all.
You have a producer which serializes with ByteArraySerializer, so your consumer must deserialize it with a ByteArrayDeserializer.
Try producing with a org.apache.kafka.common.serialization.StringSerializer if you really want to automatically deserialize in String format.
aranaran
2,9272 gold badges20 silver badges49 bronze badges

Not the answer you're looking for? Browse other questions tagged javaapache-kafkaprotocol-buffers or ask your own question.