Flush Avro Files based on size

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Flush Avro Files based on size

Nishanth S-2
Hello,
Is  there a property that can be set  to datafilewriter that would   flush the file to disk as it reaches a  particular  size. Is this something in the pipeline.


Thanks,
Nishanth
Reply | Threaded
Open this post in threaded view
|

Re: Flush Avro Files based on size

Nishanth S-2
Hello  All,

How can I flush data  from a datafilewriter  based on size/number of records ?. Does a similar functionality exists today  ?. If not does it make sense to write one . This is more related to a batch  execution where you  close the files before jvm exits and can lead to lot of  small files on the  destination file system .Now if your fs is something like  hdfs it could lead to small files problem . I could of course do a merge of these files at intervals but want to see if there is a better solution.My plan  is to have the batch changed to something like a listener that listens to an input directory and then keep the datafile writer running .Do a flush at a pre configured interval  based on time/size/no of records.

Thanks,
Nishanth

On Mon, Oct 23, 2017 at 2:26 PM, Nishanth S <[hidden email]> wrote:
Hello,
Is  there a property that can be set  to datafilewriter that would   flush the file to disk as it reaches a  particular  size. Is this something in the pipeline.


Thanks,
Nishanth