Quantcast

Apache Avro Encoding types?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Apache Avro Encoding types?

raihan26
Hello,

I have recently started using Apache Avro and I am having some doubts on that. I was reading the article on the internet and I found out that-

There are two ways to encode data when serializing with Avro: binary or JSON.

So I was thinking when we should use JSON and when we should use Binary? And what are the advantages/disadvantages for this? And which one is mainly preferable to use in production environment.

Our use case is pretty simple. We need to store the data in Cassandra for a given user id. Suppose the column name is `e1`, then I will be storing column `e` value into Cassandra for specific user id.

Any help will be appreciated on this.. Thanks



Raihan Jamal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Avro Encoding types?

Doug Cutting
Avro's primary encoding is binary.  JSON is useful for debugging and
for interoperability with other JSON-based systems.  The binary
encoding is much smaller and faster than JSON.  Avro data files and
RPC support only the binary encoding.

Doug

On Tue, Sep 17, 2013 at 12:34 AM, Raihan Jamal <[hidden email]> wrote:

> Hello,
>
> I have recently started using Apache Avro and I am having some doubts on
> that. I was reading the article on the internet and I found out that-
>
> There are two ways to encode data when serializing with Avro: binary or
> JSON.
>
> So I was thinking when we should use JSON and when we should use Binary? And
> what are the advantages/disadvantages for this? And which one is mainly
> preferable to use in production environment.
>
> Our use case is pretty simple. We need to store the data in Cassandra for a
> given user id. Suppose the column name is `e1`, then I will be storing
> column `e` value into Cassandra for specific user id.
>
> Any help will be appreciated on this.. Thanks
>
>
>
> Raihan Jamal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Avro Encoding types?

raihan26
Thanks Doug. It cleared some of my doubts. If we decided to go for binary encoding route, then actual schema also gets stored in that binary encoding? And also are there any other encoding apart from JSON or Binary?





Raihan Jamal


On Tue, Sep 17, 2013 at 10:23 AM, Doug Cutting <[hidden email]> wrote:
Avro's primary encoding is binary.  JSON is useful for debugging and
for interoperability with other JSON-based systems.  The binary
encoding is much smaller and faster than JSON.  Avro data files and
RPC support only the binary encoding.

Doug

On Tue, Sep 17, 2013 at 12:34 AM, Raihan Jamal <[hidden email]> wrote:
> Hello,
>
> I have recently started using Apache Avro and I am having some doubts on
> that. I was reading the article on the internet and I found out that-
>
> There are two ways to encode data when serializing with Avro: binary or
> JSON.
>
> So I was thinking when we should use JSON and when we should use Binary? And
> what are the advantages/disadvantages for this? And which one is mainly
> preferable to use in production environment.
>
> Our use case is pretty simple. We need to store the data in Cassandra for a
> given user id. Suppose the column name is `e1`, then I will be storing
> column `e` value into Cassandra for specific user id.
>
> Any help will be appreciated on this.. Thanks
>
>
>
> Raihan Jamal

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Avro Encoding types?

Doug Cutting
On Tue, Sep 17, 2013 at 8:14 PM, Raihan Jamal <[hidden email]> wrote:
> And also are there any other encoding apart from JSON or Binary?

No, not at present.  There's been discussion of adding a memcmp'able
encoding however (AVRO-712).

Doug
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Avro Encoding types?

raihan26
Ok Thanks... Suppose if I am going with Binary encoding route instead of JSON then does the schema also gets stored in that binary encoding format always?





Raihan Jamal


On Wed, Sep 18, 2013 at 12:19 PM, Doug Cutting <[hidden email]> wrote:
On Tue, Sep 17, 2013 at 8:14 PM, Raihan Jamal <[hidden email]> wrote:
> And also are there any other encoding apart from JSON or Binary?

No, not at present.  There's been discussion of adding a memcmp'able
encoding however (AVRO-712).

Doug

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Avro Encoding types?

Doug Cutting
On Wed, Sep 18, 2013 at 1:53 PM, Raihan Jamal <[hidden email]> wrote:
> Suppose if I am going with Binary encoding route instead of JSON then does
> the schema also gets stored in that binary encoding format always?

The schema is stored in Avro data files.  Avro RPC also manages
schemas for you.  But if you store Avro binary-encoded data in some
other container then you may need to keep track of the schema
yourself.

Doug
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Avro Encoding types?

raihan26
Thanks a lot Doug. So that means, If I need to store avro binary encoded data in Cassandra column family, then the actual schema won't get stored with that? Right? I need to have the schema through some external means whenever I am reading that binary encoded data. Right?





Raihan Jamal


On Wed, Sep 18, 2013 at 2:28 PM, Doug Cutting [via Apache Avro] <[hidden email]> wrote:
On Wed, Sep 18, 2013 at 1:53 PM, Raihan Jamal <[hidden email]> wrote:
> Suppose if I am going with Binary encoding route instead of JSON then does
> the schema also gets stored in that binary encoding format always?

The schema is stored in Avro data files.  Avro RPC also manages
schemas for you.  But if you store Avro binary-encoded data in some
other container then you may need to keep track of the schema
yourself.

Doug



If you reply to this email, your message will be added to the discussion below:
http://apache-avro.679487.n3.nabble.com/Apache-Avro-Encoding-types-tp4028194p4028215.html
To start a new topic under Avro - Users, email [hidden email]
To unsubscribe from Apache Avro, click here.
NAML

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Avro Encoding types?

Connor Doyle
Hi Raihan,

One strategy may be to store a cache of schemata elsewhere and then write just enough information (version number, md5, or similar) to look up the right schema version adjacent to your avro/binary blob column.  This would impose little overhead and still allow you to take advantage of Avro's built-in schema resolution to read old serialized data.
--
Connor

On Sep 18, 2013, at 20:03, raihan26 <[hidden email]> wrote:

Thanks a lot Doug. So that means, If I need to store avro binary encoded data in Cassandra column family, then the actual schema won't get stored with that? Right? I need to have the schema through some external means whenever I am reading that binary encoded data. Right?

Raihan Jamal


On Wed, Sep 18, 2013 at 2:28 PM, Doug Cutting [via Apache Avro] <[hidden email]> wrote:
On Wed, Sep 18, 2013 at 1:53 PM, Raihan Jamal <[hidden email]> wrote:
> Suppose if I am going with Binary encoding route instead of JSON then does
> the schema also gets stored in that binary encoding format always?

The schema is stored in Avro data files.  Avro RPC also manages
schemas for you.  But if you store Avro binary-encoded data in some
other container then you may need to keep track of the schema
yourself.

Doug



If you reply to this email, your message will be added to the discussion below:
http://apache-avro.679487.n3.nabble.com/Apache-Avro-Encoding-types-tp4028194p4028215.html
To start a new topic under Avro - Users, email [hidden email]
To unsubscribe from Apache Avro, click here.
NAML



View this message in context: Re: Apache Avro Encoding types?
Sent from the Avro - Users mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Avro Encoding types?

raihan26
Thanks Connor. That makes sense... That whats we are planning to do. We will have Cassandra column which will have composite columns. And those composite column will have-

Avro-Binary-Encoded_value   Avro-Schema-Name-For-This particular-column   Last-Modified-Date

So when we will retrieve the data for a particular USER-ID for the above column, we will know its Schema-Name, then we can pull the actual schema basis on this schema-name, and then we can deserialize the Avro-Binary-Encoded_value using that schema.






Raihan Jamal


On Wed, Sep 18, 2013 at 7:15 PM, Connor Doyle <[hidden email]> wrote:
Hi Raihan,

One strategy may be to store a cache of schemata elsewhere and then write just enough information (version number, md5, or similar) to look up the right schema version adjacent to your avro/binary blob column.  This would impose little overhead and still allow you to take advantage of Avro's built-in schema resolution to read old serialized data.
--
Connor

On Sep 18, 2013, at 20:03, raihan26 <[hidden email]> wrote:

Thanks a lot Doug. So that means, If I need to store avro binary encoded data in Cassandra column family, then the actual schema won't get stored with that? Right? I need to have the schema through some external means whenever I am reading that binary encoded data. Right?

Raihan Jamal


On Wed, Sep 18, 2013 at 2:28 PM, Doug Cutting [via Apache Avro] <[hidden email]> wrote:
On Wed, Sep 18, 2013 at 1:53 PM, Raihan Jamal <[hidden email]> wrote:
> Suppose if I am going with Binary encoding route instead of JSON then does
> the schema also gets stored in that binary encoding format always?

The schema is stored in Avro data files.  Avro RPC also manages
schemas for you.  But if you store Avro binary-encoded data in some
other container then you may need to keep track of the schema
yourself.

Doug



If you reply to this email, your message will be added to the discussion below:
http://apache-avro.679487.n3.nabble.com/Apache-Avro-Encoding-types-tp4028194p4028215.html
To start a new topic under Avro - Users, email [hidden email]
To unsubscribe from Apache Avro, click here.
NAML



View this message in context: Re: Apache Avro Encoding types?
Sent from the Avro - Users mailing list archive at Nabble.com.

Loading...