[jira] [Commented] (AVRO-1704) Standardized format for encoding messages with Avro

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[jira] [Commented] (AVRO-1704) Standardized format for encoding messages with Avro

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/AVRO-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033068#comment-16033068 ]

Jacob Rideout commented on AVRO-1704:
-------------------------------------

Hmmm ... It looks like it is in the branch-1.8. I am confused since it is NOT listed in https://s.apache.org/avro-release-note-1.8.2

> Standardized format for encoding messages with Avro
> ---------------------------------------------------
>
>                 Key: AVRO-1704
>                 URL: https://issues.apache.org/jira/browse/AVRO-1704
>             Project: Avro
>          Issue Type: Improvement
>          Components: java, spec
>            Reporter: Daniel Schierbeck
>            Assignee: Niels Basjes
>             Fix For: 1.9.0, 1.8.2
>
>         Attachments: AVRO-1704-20160410.patch, AVRO-1704-2016-05-03-Unfinished.patch, AVRO-1704.3.patch, AVRO-1704.4.patch
>
>
> I'm currently using the Datafile format for encoding messages that are written to Kafka and Cassandra. This seems rather wasteful:
> 1. I only encode a single record at a time, so there's no need for sync markers and other metadata related to multi-record files.
> 2. The entire schema is inlined every time.
> However, the Datafile format is the only one that has been standardized, meaning that I can read and write data with minimal effort across the various languages in use in my organization. If there was a standardized format for encoding single values that was optimized for out-of-band schema transfer, I would much rather use that.
> I think the necessary pieces of the format would be:
> 1. A format version number.
> 2. A schema fingerprint type identifier, i.e. Rabin, MD5, SHA256, etc.
> 3. The actual schema fingerprint (according to the type.)
> 4. Optional metadata map.
> 5. The encoded datum.
> The language libraries would implement a MessageWriter that would encode datums in this format, as well as a MessageReader that, given a SchemaStore, would be able to decode datums. The reader would decode the fingerprint and ask its SchemaStore to return the corresponding writer's schema.
> The idea is that SchemaStore would be an abstract interface that allowed library users to inject custom backends. A simple, file system based one could be provided out of the box.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Loading...