|
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
|
|
No, the schema needs to be present in some form to tell the reader how to decode the data.
You can generate classes from the schema and pass in the class, but that is just a different way of passing in the schema. On Mon, Dec 5, 2011 at 9:33 AM, Gaurav <[hidden email]> wrote: Hi, -- Matt Stevenson. |
|
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
|
|
The DataFile file-format stores the schema, as part of its header.
That's one of its advantages. The encoder/decoder are lower levels, and do not do that. You need to manage the schema yourself if you choose to use the encoder/decoder instead of the datafile format (why?) - the source stream can't have it if you do not store it - it makes no sense for the encoder to store schema for every given record, into a stream. On Mon, Dec 5, 2011 at 11:18 PM, Gaurav Nanda <[hidden email]> wrote: > I guess I did not put it right way. > > See this sample code: > --------------------------------------- > public static void testRead (File file) throws IOException { > GenericDatumReader<GenericData.Record> datum = new > GenericDatumReader<GenericData.Record>(); > DataFileReader<GenericData.Record> reader = new > DataFileReader<GenericData.Record>(file, datum); > > GenericData.Record record = new GenericData.Record(reader.getSchema()); > while (reader.hasNext()) { > reader.next(record); > System.out.println("Name " + record.get("name") + " Age " + > record.get("age")); > } > > reader.close(); > } > ------------------------------- > This takes file as an input, which contains both schema and actual data. > In my case, Instead of having a file, I have some other stream of > schema & data which I am passing to DecodeData() function. > > So, the question now is, how do I extract schema from there? > > Thanks, > Gaurav Nanda > > On Mon, Dec 5, 2011 at 9:40 PM, Matt Stevenson > <[hidden email]> wrote: >> No, the schema needs to be present in some form to tell the reader how to >> decode the data. >> You can generate classes from the schema and pass in the class, but that is >> just a different way of passing in the schema. >> >> >> On Mon, Dec 5, 2011 at 9:33 AM, Gaurav <[hidden email]> wrote: >>> >>> Hi, >>> >>> I am trying to read byte stream of encoded data, which is coming from some >>> source but File. So I should not use DataFileReader. >>> >>> I wrote following code to do that, but here I have to specify schema on my >>> own, which ideally should come from data itself. Is there any other way to >>> get decode data with explicitly specifying schema and without using >>> DataFileReader? >>> ---------------------- >>> private static void DecodeData(byte[] buf) throws IOException { >>> // TODO Auto-generated method stub >>> Schema schema = createSchema(); >>> GenericDatumReader<GenericData.Record> datum = new >>> GenericDatumReader<GenericData.Record>(schema); >>> >>> ByteArrayInputStream in = new ByteArrayInputStream(buf); >>> BinaryDecoder decoder = DECODER_FACTORY.binaryDecoder(in, >>> null); >>> >>> GenericData.Record record = new >>> GenericData.Record(datum.getSchema()); >>> datum.read(record, decoder); >>> >>> System.out.println(record.get("trade")); >>> } >>> --------------------- >>> >>> Thanks, >>> Gaurav Nanda >>> >>> -- >>> View this message in context: >>> http://apache-avro.679487.n3.nabble.com/Decode-without-using-DataFileReader-tp3561722p3561722.html >>> Sent from the Avro - Users mailing list archive at Nabble.com. >> >> >> >> >> -- >> Matt Stevenson. -- Harsh J |
|
This post was updated on .
CONTENTS DELETED
The author has deleted this message.
|
|
I do not understand what you're trying to achieve here.
Encoders work at the primitive level - they merely serialize a given data structure (records, unions, for example), and not look at the schema (Notice - you create a record with a schema, not an encoder with a schema). Decoders could do the same and read back primitives, but if they had a schema they'd read back properly packed data structures. Since encoders do not store schema, decoders need it externally. DataFiles solve this for you by writing the schema itself into the file as a header. The reader loads this schema into the decoder when it attempts to read it back. On 05-Dec-2011, at 11:43 PM, Gaurav wrote: >>> it makes no sense for the encoder to store schema for every given record, > into a stream. > > Agree. Its not even encode/decoders job to store schema. > > While writing data, I noticed that we don't even need DataFileWriter, all it > needs is GenericDatumWriter, Encoder and any kind of output stream (which > can also be a file output stream). > > Sample: > ------------------------------------------------ > private static ByteArrayOutputStream EncodeData() throws IOException { > // TODO Auto-generated method stub > Schema schema = createMetaData(); > > GenericDatumWriter<GenericData.Record> datum = new > GenericDatumWriter<GenericData.Record>(schema); > > GenericData.Record inner_record = new > GenericData.Record(schema.getField("trade").schema()); > inner_record.put("inner_abc", new Long(23490843)); > > GenericData.Record record = new GenericData.Record(schema); > record.put("abc", 1050324); > record.put("trade", inner_record); > > ByteArrayOutputStream out = new ByteArrayOutputStream(); > BinaryEncoder encoder = ENCODER_FACTORY.binaryEncoder(out, null); > > datum.write(record, encoder); > > encoder.flush(); > out.close(); > > return out; > } > ------------------------------------------------ > > Then why can't I just use back the same output stream to read back metadata > and data. It should not be the responsibility of stream reader (which in > this case is served by FileDataReader) to parse out schema. > > Thanks, > Gaurav Nanda > > -- > View this message in context: http://apache-avro.679487.n3.nabble.com/Decode-without-using-DataFileReader-tp3561722p3562127.html > Sent from the Avro - Users mailing list archive at Nabble.com. |
| Powered by Nabble | Edit this page |
