appending to Object Container Files

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

appending to Object Container Files

Matthew Stowe

Hello Avro User Community,

 

I am reviewing the specification for Object Container Files (<a href="http://avro.apache.org/docs/1.8.2/spec.html#Object&#43;Container&#43;Files">http://avro.apache.org/docs/1.8.2/spec.html#Object+Container+Files) and the available Avro libraries from Apache and Microsoft.

 

I am hoping the community can help clarify a question I have with respect to Object Container files.

 

My question is…

Is it compliant with the specification to append objects to an Object Container File after the header and some number of initial objects have been written and the file closed?

 

The reason I ask this question is that while the specification does not explicitly state that appending to an Object Container File is not allowed, none of the libraries I have evaluated support doing so.  The libraries I have looked at support creating a new Object Container File, writing the header, some number of objects, and then closing it… but they do not support coming back to that file at a later time and appending more objects.

 

Is this simply a gap in the libraries or am I missing something in the specification that states appending to Object Container Files is prohibited?  Or is this rather against some best practice that I am not aware of, such as in HDFS?

 

Cheers,

Matt

Reply | Threaded
Open this post in threaded view
|

Re: appending to Object Container Files

Doug Cutting-2
The Avro file format supports appends.  However some filesystems
(e.g., HDFS and S3) may not support append, so applications generally
avoid depending on it.  Also, it can complicate application semantics
when the contents of files change after they are first created.

The Java API supports append:

https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)

There is also an open feature request to extend the concat tool to
support append:

https://issues.apache.org/jira/browse/AVRO-1856

Doug

On Fri, Aug 25, 2017 at 12:10 PM, Matthew Stowe
<[hidden email]> wrote:

> Hello Avro User Community,
>
>
>
> I am reviewing the specification for Object Container Files
> (http://avro.apache.org/docs/1.8.2/spec.html#Object+Container+Files) and the
> available Avro libraries from Apache and Microsoft.
>
>
>
> I am hoping the community can help clarify a question I have with respect to
> Object Container files.
>
>
>
> My question is…
>
> Is it compliant with the specification to append objects to an Object
> Container File after the header and some number of initial objects have been
> written and the file closed?
>
>
>
> The reason I ask this question is that while the specification does not
> explicitly state that appending to an Object Container File is not allowed,
> none of the libraries I have evaluated support doing so.  The libraries I
> have looked at support creating a new Object Container File, writing the
> header, some number of objects, and then closing it… but they do not support
> coming back to that file at a later time and appending more objects.
>
>
>
> Is this simply a gap in the libraries or am I missing something in the
> specification that states appending to Object Container Files is prohibited?
> Or is this rather against some best practice that I am not aware of, such as
> in HDFS?
>
>
>
> Cheers,
>
> Matt
Reply | Threaded
Open this post in threaded view
|

RE: appending to Object Container Files

Matthew Stowe
Thanks, Doug!  Is there any planned effort/existing feature to extend the c-sharp implementation (https://github.com/apache/avro/tree/master/lang/csharp) with support for append?

Thanks,
Matt

-----Original Message-----
From: Doug Cutting [mailto:[hidden email]]
Sent: Friday, August 25, 2017 6:18 PM
To: [hidden email]
Subject: Re: appending to Object Container Files

The Avro file format supports appends.  However some filesystems (e.g., HDFS and S3) may not support append, so applications generally avoid depending on it.  Also, it can complicate application semantics when the contents of files change after they are first created.

The Java API supports append:

https://avro.apache.org/docs/1.8.2/api/java/org/apache/avro/file/DataFileWriter.html#appendTo(java.io.File)

There is also an open feature request to extend the concat tool to support append:

https://issues.apache.org/jira/browse/AVRO-1856

Doug

On Fri, Aug 25, 2017 at 12:10 PM, Matthew Stowe <[hidden email]> wrote:

> Hello Avro User Community,
>
>
>
> I am reviewing the specification for Object Container Files
> (http://avro.apache.org/docs/1.8.2/spec.html#Object+Container+Files)
> and the available Avro libraries from Apache and Microsoft.
>
>
>
> I am hoping the community can help clarify a question I have with
> respect to Object Container files.
>
>
>
> My question is…
>
> Is it compliant with the specification to append objects to an Object
> Container File after the header and some number of initial objects
> have been written and the file closed?
>
>
>
> The reason I ask this question is that while the specification does
> not explicitly state that appending to an Object Container File is not
> allowed, none of the libraries I have evaluated support doing so.  The
> libraries I have looked at support creating a new Object Container
> File, writing the header, some number of objects, and then closing it…
> but they do not support coming back to that file at a later time and appending more objects.
>
>
>
> Is this simply a gap in the libraries or am I missing something in the
> specification that states appending to Object Container Files is prohibited?
> Or is this rather against some best practice that I am not aware of,
> such as in HDFS?
>
>
>
> Cheers,
>
> Matt

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
______________________________________________________________________