Quantcast

avro compression using snappy and deflate

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

avro compression using snappy and deflate

snikhil0
I must be doing something wrong: I am writing out avro files with three options: a. no codec b. deflate codec c. snappy codec I am measuring size of final avro file. In my observation, the snappy file is larger than the original avro file? duh? code snippet: File fs = new File("$DATA/log_snappy.avro"); DatumWriter writer = new GenericDatumWriter( schema); dataFileWriter = new DataFileWriter(writer); dataFileWriter.setCodec(CodecFactory.snappyCodec()); dataFileWriter.create(schema, fs); while(...){ GenericRecord datum = //get datum from object dataFileWriter.append(datum); dataFileWriter.flush(); } Thanks, Nikhil
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: avro compression using snappy and deflate

Serge Blazhievsky
How big is the original data you are trying to compress? 

Sent from my iPhone

On Mar 30, 2012, at 12:43 AM, snikhil0 <[hidden email]> wrote:

I must be doing something wrong: I am writing out avro files with three options: a. no codec b. deflate codec c. snappy codec I am measuring size of final avro file. In my observation, the snappy file is larger than the original avro file? duh? code snippet: File fs = new File("$DATA/log_snappy.avro"); DatumWriter writer = new GenericDatumWriter( schema); dataFileWriter = new DataFileWriter(writer); dataFileWriter.setCodec(CodecFactory.snappyCodec()); dataFileWriter.create(schema, fs); while(...){ GenericRecord datum = //get datum from object dataFileWriter.append(datum); dataFileWriter.flush(); } Thanks, Nikhil

View this message in context: avro compression using snappy and deflate
Sent from the Avro - Users mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: avro compression using snappy and deflate

snikhil0
The original data file (a text file) is 40GB, the avro file is about 12GB, avro snappy is 13GB!

Thanks,
Nikhil
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: avro compression using snappy and deflate

snikhil0
Hello All,

I think I figured our where I goofed up.

I was flushing on every record, so basically this was compression per
record, so it had a meta data with each record. This was adding more data
to the output when compared to avro.

So now I have better figures: atleast looks realistic, still need to find
out of it is map-reduceable.
Avro= 12G
Avro+Defalte= 4.5G
Avro+Snappy = 5.5G

Have others tried Avro + LZO?

Thanks,
Nikhil


On 3/30/12 12:54 AM, "Shirahatti, Nikhil" <[hidden email]> wrote:

>The original data file (a text file) is 40GB, the avro file is about 12GB,
>avro snappy is 13GB!
>
>Thanks,
>Nikhil
>
>--
>View this message in context:
>http://apache-avro.679487.n3.nabble.com/avro-compression-using-snappy-and-
>deflate-tp3870167p3870184.html
>Sent from the Avro - Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: avro compression using snappy and deflate

Tatu Saloranta
On Fri, Mar 30, 2012 at 12:08 PM, Shirahatti, Nikhil
<[hidden email]> wrote:

> Hello All,
>
> I think I figured our where I goofed up.
>
> I was flushing on every record, so basically this was compression per
> record, so it had a meta data with each record. This was adding more data
> to the output when compared to avro.
>
> So now I have better figures: atleast looks realistic, still need to find
> out of it is map-reduceable.
> Avro= 12G
> Avro+Defalte= 4.5G
> Avro+Snappy = 5.5G
>
> Have others tried Avro + LZO?

Have you checked out jvm-compressor-benchmark page?
(https://github.com/ning/jvm-compressor-benchmark)
It has comparison of quite a few native open source compression codecs.
While test data does not include Avro, I would not expect results to
differ all that much.

LZO isn't a particularly compelling codec in any of combinations
tested. Snappy, LZF and LZ4 (not yet included in public results, but
there's code, and preliminary results are very good) are the fastest
Java codecs.
Gzip (deflate) produces more compact results, and is fastest of "high
compression" codecs (although significantly lower than lzf/snappy/lz4)

-+ Tatu +-

ps. If anyone has publically available set of Avro data, it would be
quite easy to add Avro-data test to jvm compressor benchmark
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: avro compression using snappy and deflate

Scott Carey-2
In reply to this post by snikhil0


On 3/30/12 12:08 PM, "Shirahatti, Nikhil" <[hidden email]> wrote:

>Hello All,
>
>I think I figured our where I goofed up.
>
>I was flushing on every record, so basically this was compression per
>record, so it had a meta data with each record. This was adding more data
>to the output when compared to avro.
>
>So now I have better figures: atleast looks realistic, still need to find
>out of it is map-reduceable.
>Avro= 12G
>Avro+Defalte= 4.5G

Deflate is affected quite a bit by the compression level selected (1 to 9)
in both performance and level of compression.  However, in my experience
anything past level 6 is only very slightly smaller and much slower, while
the difference between levels 1 to 3 is large on both fronts.

>Avro+Snappy = 5.5G
>
>Have others tried Avro + LZO?

I have not heard of anyone doing this.  LZO is not Apache license
compatible, and there are now several alternatives that are in the same
class of compression algorithm available, including Snappy.

>
>Thanks,
>Nikhil
>
>
>On 3/30/12 12:54 AM, "Shirahatti, Nikhil" <[hidden email]> wrote:
>
>>The original data file (a text file) is 40GB, the avro file is about
>>12GB,
>>avro snappy is 13GB!
>>
>>Thanks,
>>Nikhil
>>
>>--
>>View this message in context:
>>http://apache-avro.679487.n3.nabble.com/avro-compression-using-snappy-and
>>-
>>deflate-tp3870167p3870184.html
>>Sent from the Avro - Users mailing list archive at Nabble.com.
>


Loading...