Deserialize with different schema

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Deserialize with different schema

Mehrez Alachheb
Hi,

I am working in project, in which i have to deserialize an avro files provided by an other external company.
The problem is that the schema  (example below) of the serialized avro files doesn't contain a namespace, however i need to add the namespace to the avro Schema.
I can't serialize the avro files with another schema because we get them from another  company.
I created a new schema( example below) with name space and i generated the associated java classes.

How i can deserialize the avro files with my new schema ? 

Thanks,
Mehrez.

company schema :
{"type":"record", "name":"AvroData", "fields": [.....] } 

the schema that i want:
 {"type":"record", "name":"AvroData", "namespace":"company.avro", "fields": [.....] } 



Reply | Threaded
Open this post in threaded view
|

Re: Deserialize with different schema

Sam Groth
You should be able to specify a reader schema with the namespace and the writer schema without it. See https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/specific/SpecificData.html#createDatumReader(org.apache.avro.Schema, org.apache.avro.Schema)


Sam



On Thursday, August 6, 2015 3:31 PM, Mehrez Alachheb <[hidden email]> wrote:


Hi,

I am working in project, in which i have to deserialize an avro files provided by an other external company.
The problem is that the schema  (example below) of the serialized avro files doesn't contain a namespace, however i need to add the namespace to the avro Schema.
I can't serialize the avro files with another schema because we get them from another  company.
I created a new schema( example below) with name space and i generated the associated java classes.

How i can deserialize the avro files with my new schema ? 

Thanks,
Mehrez.

company schema :
{"type":"record", "name":"AvroData", "fields": [.....] } 

the schema that i want:
 {"type":"record", "name":"AvroData", "namespace":"company.avro", "fields": [.....] } 





Reply | Threaded
Open this post in threaded view
|

Re: Deserialize with different schema

Mehrez Alachheb
thanks sam for your help,
When i specify a reader schema like this (with scala):

// DataAvroPacket.getClassSchema()  schema of the class generated with namespace”
//  serialization.Schemas.sw The origin schema of the avro file 

  val specificD = new SpecificData
  val datumReader = specificD.createDatumReader(serialization.Schemas.sw, DataAvroPacket.getClassSchema())
  val fileReader = new DataFileReader(new File("/tmp/test.avro"), datumReader)
  while (fileReader.hasNext()) { val user = fileReader.next()}

I have an AvroTypeException: 

org.apache.avro.AvroTypeException: Found MainPacket, expecting union
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
at .<init>(<console>:19)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
at scala.tools.nsc.interpreter.ILoop.main(ILoop.scala:904)
at xsbt.ConsoleInterface.run(ConsoleInterface.scala:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101)
at sbt.compiler.AnalyzingCompiler.console(AnalyzingCompiler.scala:76)
at sbt.Console.sbt$Console$$console0$1(Console.scala:22)
at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply$mcV$sp(Console.scala:23)
at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply(Console.scala:23)
at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply(Console.scala:23)
at sbt.Logger$$anon$4.apply(Logger.scala:85)
at sbt.TrapExit$App.run(TrapExit.scala:248)
at java.lang.Thread.run(Thread.java:745)

On 07 Aug 2015, at 00:02, Sam Groth <[hidden email]> wrote:

You should be able to specify a reader schema with the namespace and the writer schema without it. See https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/specific/SpecificData.html#createDatumReader(org.apache.avro.Schema, org.apache.avro.Schema)


Sam



On Thursday, August 6, 2015 3:31 PM, Mehrez Alachheb <[hidden email]> wrote:


Hi,

I am working in project, in which i have to deserialize an avro files provided by an other external company.
The problem is that the schema  (example below) of the serialized avro files doesn't contain a namespace, however i need to add the namespace to the avro Schema.
I can't serialize the avro files with another schema because we get them from another  company.
I created a new schema( example below) with name space and i generated the associated java classes.

How i can deserialize the avro files with my new schema ? 

Thanks,
Mehrez.

company schema :
{"type":"record", "name":"AvroData", "fields": [.....] } 

the schema that i want:
 {"type":"record", "name":"AvroData", "namespace":"company.avro", "fields": [.....] } 






Reply | Threaded
Open this post in threaded view
|

Re: Deserialize with different schema

julianpeeters
Hi Mehrez,

Can I guess? You're reading some Python/Pig AvroStorage output? Hate that.

I get the same error when the reader schema has a namespace but the writer has none. But only when a record is in a union.


Here's a pair of small runnable examples that show errors with reading and writing accross namespaces.

For the sake of being complete, here's my question, and it looks like Vitaly Gordon ran into this issue as well, here.

IHMO this is a bug that hinders Avro's utility as a data interchange format. I don't think the technical issue is in trying to import a class from the default package (which succeeds outside of unions), but instead it's from trying to resolve a union reflectively and the writer schema's fullname doesn't match the class' fullname.

The fix for now:
You could try using the Generic API instead, and then map the Generic Records to your Specific Records manually. Here's a start in Java:

        import org.apache.avro.Schema;
        import org.apache.avro.file.DataFileReader;
        import org.apache.avro.generic.GenericDatumReader;
        import org.apache.avro.generic.GenericRecord;

        GenericDatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schema);
        DataFileReader<GenericRecord> fileReader = new DataFileReader<>(file, datumReader);
        GenericRecord record = fileReader.next();


Cheers,
Julian
Reply | Threaded
Open this post in threaded view
|

Re: Deserialize with different schema

Mehrez Alachheb
Thanks Julian for your reply,
 Yes  with GenericRecord i can deserialise my data easily.

Mehrez.
 

> On 13 Aug 2015, at 10:49, julianpeeters <[hidden email]> wrote:
>
> Hi Mehrez,
>
> Can I guess? You're reading some Python/Pig AvroStorage output? Hate that.
>
> I get the same error when the reader schema has a namespace but the writer
> has none. But only when a record is in a union.
>
>
> Here's a pair of small runnable  examples
> <https://github.com/julianpeeters/avro-namespace-issues/tree/master/reading>  
> that show errors with reading and writing accross namespaces.
>
> For the sake of being complete, here's my  question
> <http://apache-avro.679487.n3.nabble.com/Issues-reading-and-writing-namespace-less-schemas-from-namespaced-Specific-Records-td4032092.html>
> , and it looks like Vitaly Gordon ran into this issue as well,  here
> <http://apache-avro.679487.n3.nabble.com/Unable-to-compile-a-namespace-less-schema-td4028318.html>
> .
>
> IHMO this is a bug that hinders Avro's utility as a data interchange format.
> I don't think the technical issue is in trying to import a class from the
> default package (which succeeds outside of unions), but instead it's from
> trying to resolve a union reflectively and the writer schema's fullname
> doesn't match the class' fullname.
>
> The fix for now:
> You could try using the Generic API instead, and then map the Generic
> Records to your Specific Records manually. Here's a start in Java:
>
>        import org.apache.avro.Schema;
>        import org.apache.avro.file.DataFileReader;
>        import org.apache.avro.generic.GenericDatumReader;
>        import org.apache.avro.generic.GenericRecord;
>
>        GenericDatumReader<GenericRecord> datumReader = new
> GenericDatumReader<>(schema);
>        DataFileReader<GenericRecord> fileReader = new
> DataFileReader<>(file, datumReader);
>        GenericRecord record = fileReader.next();
>
>
> Cheers,
> Julian
>
>
>
> --
> View this message in context: http://apache-avro.679487.n3.nabble.com/Deserialize-with-different-schema-tp4032782p4032816.html
> Sent from the Avro - Users mailing list archive at Nabble.com.