Quantcast

Parsing a Pair's value - inherited namespace?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Parsing a Pair's value - inherited namespace?

nir_zamir
Hi,

I noticed that after calling:

AvroJob.setMapOutputSchema(conf, Pair.getPairSchema(Schema.create(Type.INT), schema));
(schema is parsed from an avro file, and has no namespace)

When the M/R job is run, there's a call to AvroJob.getJobOutputSchema which calls Schema.parse - which parses the schema I set in setMapOutputSchema.

The problem: the schema (the value in the Pair returned by getJobOutputSchema) has no namespace, so it gets it from the Pair (which is "org.apache.avro.mapred"), so my schema's full name is "org.apache.avro.mapred.MySchema".

Why do I care? Well, when the schema is a union, the avro items passed to the AvroMapper.map have no namespace (so their full name is "MySchema", and are not matching the mapper's output schema (I get a "not in union" exception). Note that this happens although I pass a Pair to collect(), but in this case, the Pair's namespace is ignored.

I hope it's clear - will be happy to provide any more details.

Your help is much appreciated!!

Thanks,
Nir
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing a Pair's value - inherited namespace?

Doug Cutting
This looks to be a bug.  I filed a Jira issue and attached a patch at:


Please let me know if this fixes things for you.

Thanks,

Doug


On Tue, Apr 9, 2013 at 9:02 AM, nir_zamir <[hidden email]> wrote:
Hi,

I noticed that after calling:

/AvroJob.setMapOutputSchema(conf,
Pair.getPairSchema(Schema.create(Type.INT), schema));/
(schema is parsed from an avro file, and has no namespace)

When the M/R job is run, there's a call to /AvroJob.getJobOutputSchema
/which calls /Schema.parse/ - which parses the schema I set in
setMapOutputSchema.

The problem: the schema (the value in the Pair returned by
getJobOutputSchema) has no namespace, so it gets it from the Pair (which is
"org.apache.avro.mapred"), so my schema's full name is
"org.apache.avro.mapred.MySchema".

Why do I care? Well, when the schema is a union, the avro items passed to
the AvroMapper.map have no namespace (so their full name is "MySchema", and
are not matching the mapper's output schema (I get a "not in union"
exception). Note that this happens although I pass a Pair to collect(), but
in this case, the Pair's namespace is ignored.

I hope it's clear - will be happy to provide any more details.

Your help is much appreciated!!

Thanks,
Nir




--
View this message in context: http://apache-avro.679487.n3.nabble.com/Parsing-a-Pair-s-value-inherited-namespace-tp4026810.html
Sent from the Avro - Users mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing a Pair's value - inherited namespace?

nir_zamir
Hi doug,

I can confirm the patch did the job! Thanks a lot!
This is also fixing my original issue I saw with the union.

Is there a place to see the ETA of the next Avro version release?

Thanks!
Nir
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing a Pair's value - inherited namespace?

Doug Cutting
Thanks for confirming the fix.

I committed the patch.  I'm not sure when the next Avro release will
be, but probably sometime in the next month.

Doug

On Sun, Apr 14, 2013 at 1:35 AM, nir_zamir <[hidden email]> wrote:

> Hi doug,
>
> I can confirm the patch did the job! Thanks a lot!
> This is also fixing my  original issue
> <http://apache-avro.679487.n3.nabble.com/Union-in-AvroMapper-map-Not-in-Union-td4026706.html>
> I saw with the union.
>
> Is there a place to see the ETA of the next Avro version release?
>
> Thanks!
> Nir
>
>
>
> --
> View this message in context: http://apache-avro.679487.n3.nabble.com/Parsing-a-Pair-s-value-inherited-namespace-tp4026810p4026865.html
> Sent from the Avro - Users mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing a Pair's value - inherited namespace?

nir_zamir
Thanks Doug,

Well, the fix worked for me in my unit tests, and when I wanted to test in on a real Hadoop cluster, I encountered a strange behavior of Hadoop using the old Avro code for some reason (I know it was the old code since I tried modifying the thrown exception and it wasn't affected).

To 'start fresh' I deleted my Maven local repository and now I can't compile Avro locally... I posted a separate topic here.
Will update once I confirm it on the cluster.

Thanks,
Nir.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing a Pair's value - inherited namespace?

nir_zamir
UPDATE: Hadoop was loading Avro from its own classpath (amazing how many folders include the avro jar..).
I manipulated it so it uses the patched avro and now I can confirm the fix works!

Thanks,
Nir.

Loading...