Quantcast

Decode without using DataFileReader

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Decode without using DataFileReader

Gaurav
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Decode without using DataFileReader

Matt Stevenson
No, the schema needs to be present in some form to tell the reader how to decode the data.
You can generate classes from the schema and pass in the class, but that is just a different way of passing in the schema.

On Mon, Dec 5, 2011 at 9:33 AM, Gaurav <[hidden email]> wrote:
Hi,

I am trying to read byte stream of encoded data, which is coming from some
source but File. So I should not use DataFileReader.

I wrote following code to do that, but here I have to specify schema on my
own, which ideally should come from data itself. Is there any other way to
get decode data with explicitly specifying schema and without using
DataFileReader?
----------------------
       private static void DecodeData(byte[] buf) throws IOException {
               // TODO Auto-generated method stub
               Schema schema = createSchema();
               GenericDatumReader<GenericData.Record> datum = new
GenericDatumReader<GenericData.Record>(schema);

               ByteArrayInputStream in = new ByteArrayInputStream(buf);
               BinaryDecoder decoder = DECODER_FACTORY.binaryDecoder(in, null);

               GenericData.Record record = new GenericData.Record(datum.getSchema());
               datum.read(record, decoder);

               System.out.println(record.get("trade"));
       }
---------------------

Thanks,
Gaurav Nanda

--
View this message in context: http://apache-avro.679487.n3.nabble.com/Decode-without-using-DataFileReader-tp3561722p3561722.html
Sent from the Avro - Users mailing list archive at Nabble.com.



--
Matt Stevenson.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Decode without using DataFileReader

Gaurav
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Decode without using DataFileReader

Harsh J-2
The DataFile file-format stores the schema, as part of its header.
That's one of its advantages.

The encoder/decoder are lower levels, and do not do that. You need to
manage the schema yourself if you choose to use the encoder/decoder
instead of the datafile format (why?) - the source stream can't have
it if you do not store it - it makes no sense for the encoder to store
schema for every given record, into a stream.

On Mon, Dec 5, 2011 at 11:18 PM, Gaurav Nanda <[hidden email]> wrote:

> I guess I did not put it right way.
>
> See this sample code:
> ---------------------------------------
> public static void testRead (File file) throws IOException {
>    GenericDatumReader<GenericData.Record> datum = new
> GenericDatumReader<GenericData.Record>();
>    DataFileReader<GenericData.Record> reader = new
> DataFileReader<GenericData.Record>(file, datum);
>
>    GenericData.Record record = new GenericData.Record(reader.getSchema());
>    while (reader.hasNext()) {
>      reader.next(record);
>      System.out.println("Name " + record.get("name") + " Age " +
> record.get("age"));
>    }
>
>    reader.close();
>  }
> -------------------------------
> This takes file as an input, which contains both schema and actual data.
> In my case, Instead of having a file, I have some other stream of
> schema & data which I am passing to DecodeData() function.
>
> So, the question now is, how do I extract schema from there?
>
> Thanks,
> Gaurav Nanda
>
> On Mon, Dec 5, 2011 at 9:40 PM, Matt Stevenson
> <[hidden email]> wrote:
>> No, the schema needs to be present in some form to tell the reader how to
>> decode the data.
>> You can generate classes from the schema and pass in the class, but that is
>> just a different way of passing in the schema.
>>
>>
>> On Mon, Dec 5, 2011 at 9:33 AM, Gaurav <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> I am trying to read byte stream of encoded data, which is coming from some
>>> source but File. So I should not use DataFileReader.
>>>
>>> I wrote following code to do that, but here I have to specify schema on my
>>> own, which ideally should come from data itself. Is there any other way to
>>> get decode data with explicitly specifying schema and without using
>>> DataFileReader?
>>> ----------------------
>>>        private static void DecodeData(byte[] buf) throws IOException {
>>>                // TODO Auto-generated method stub
>>>                Schema schema = createSchema();
>>>                GenericDatumReader<GenericData.Record> datum = new
>>> GenericDatumReader<GenericData.Record>(schema);
>>>
>>>                ByteArrayInputStream in = new ByteArrayInputStream(buf);
>>>                BinaryDecoder decoder = DECODER_FACTORY.binaryDecoder(in,
>>> null);
>>>
>>>                GenericData.Record record = new
>>> GenericData.Record(datum.getSchema());
>>>                datum.read(record, decoder);
>>>
>>>                System.out.println(record.get("trade"));
>>>        }
>>> ---------------------
>>>
>>> Thanks,
>>> Gaurav Nanda
>>>
>>> --
>>> View this message in context:
>>> http://apache-avro.679487.n3.nabble.com/Decode-without-using-DataFileReader-tp3561722p3561722.html
>>> Sent from the Avro - Users mailing list archive at Nabble.com.
>>
>>
>>
>>
>> --
>> Matt Stevenson.



--
Harsh J
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Decode without using DataFileReader

Gaurav
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Decode without using DataFileReader

Harsh J-2
I do not understand what you're trying to achieve here.

Encoders work at the primitive level - they merely serialize a given data structure (records, unions, for example), and not look at the schema (Notice - you create a record with a schema, not an encoder with a schema). Decoders could do the same and read back primitives, but if they had a schema they'd read back properly packed data structures. Since encoders do not store schema, decoders need it externally.

DataFiles solve this for you by writing the schema itself into the file as a header. The reader loads this schema into the decoder when it attempts to read it back.

On 05-Dec-2011, at 11:43 PM, Gaurav wrote:

>>> it makes no sense for the encoder to store schema for every given record,
> into a stream.
>
> Agree. Its not even encode/decoders job to store schema.
>
> While writing data, I noticed that we don't even need DataFileWriter, all it
> needs is GenericDatumWriter, Encoder and any kind of output stream (which
> can also be a file output stream).
>
> Sample:
> ------------------------------------------------
> private static ByteArrayOutputStream EncodeData() throws IOException {
> // TODO Auto-generated method stub
> Schema schema = createMetaData();
>
> GenericDatumWriter<GenericData.Record> datum = new
> GenericDatumWriter<GenericData.Record>(schema);
>
> GenericData.Record inner_record = new
> GenericData.Record(schema.getField("trade").schema());
> inner_record.put("inner_abc", new Long(23490843));
>
> GenericData.Record record = new GenericData.Record(schema);
> record.put("abc", 1050324);
> record.put("trade", inner_record);
>
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> BinaryEncoder encoder = ENCODER_FACTORY.binaryEncoder(out, null);
>
> datum.write(record, encoder);
>
> encoder.flush();
> out.close();
>
> return out;
> }
> ------------------------------------------------
>
> Then why can't I just use back the same output stream to read back metadata
> and data. It should not be the responsibility of stream reader (which in
> this case is served by FileDataReader) to parse out schema.
>
> Thanks,
> Gaurav Nanda
>
> --
> View this message in context: http://apache-avro.679487.n3.nabble.com/Decode-without-using-DataFileReader-tp3561722p3562127.html
> Sent from the Avro - Users mailing list archive at Nabble.com.

Loading...