(no subject)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

(no subject)

svend frolund
Hello,

I cannot seem to get avro union types to work properly in the c++ codebase that I pulled from your github repo a couple of weeks ago. I want to specify that an object attribute can be either null or a string in order to capture some notion of optional attributes in my json data. However, when decoding data that actually has a string value for the "optional" attribute in question, I get the following exception: "Incorrect token in the stream. Expected: Object start, found String". Here is a small program that replicates the issue:

      std::string schema;
      schema += "{";
      schema += "   \"name\" : \"simple\", ";
      schema += "   \"type\" : \"record\", ";
      schema += "   \"fields\" : [ { \"name\" : \"last\", \"type\" : [ \"null\", \"string\"] } ] ";
      schema += "}";

      std::string value;
      value += "{";
      value += "   \"last\" : \"dog\" ";
      value += "}";

      std::istringstream schemass(schema);
      std::istringstream valuess(value);

      avro::ValidSchema cpxSchema;
      avro::compileJsonSchema(schemass, cpxSchema);

      std::unique_ptr<avro::InputStream> json_is = avro::istreamInputStream(valuess);

      /* JSON decoder */
      avro::DecoderPtr json_decoder = avro::jsonDecoder(cpxSchema);
      avro::GenericDatum *datum = new avro::GenericDatum(cpxSchema);

      try
      {
         /* Decode JSON to Avro datum */
         json_decoder->init(*json_is);
         avro::decode(*json_decoder, *datum);
      }
      catch(const avro::Exception &_e)
      {
          // throws Incorrect token in the stream. Expected: Object start, found String
      }

Do I need to configure the system in a particular way for this to work, or does the current implementation simply not support these types of unions.

I sincerely hope someone can help!

All the best,

   Svend
Reply | Threaded
Open this post in threaded view
|

Re: (Nothing -> C++ Union Decoding)

John McClean
From the spec:

A union is encoded by first writing an int value indicating the zero-based position within the union of the schema of its value. The value is then encoded per the indicated schema within the union.

In other words 'value' doesn't have a way for the decoder to figure out which type is in the union.

(Where are you getting this example from?)

J

On Fri, May 21, 2021 at 12:36 AM svend frolund <[hidden email]> wrote:
Hello,

I cannot seem to get avro union types to work properly in the c++ codebase that I pulled from your github repo a couple of weeks ago. I want to specify that an object attribute can be either null or a string in order to capture some notion of optional attributes in my json data. However, when decoding data that actually has a string value for the "optional" attribute in question, I get the following exception: "Incorrect token in the stream. Expected: Object start, found String". Here is a small program that replicates the issue:

      std::string schema;
      schema += "{";
      schema += "   \"name\" : \"simple\", ";
      schema += "   \"type\" : \"record\", ";
      schema += "   \"fields\" : [ { \"name\" : \"last\", \"type\" : [ \"null\", \"string\"] } ] ";
      schema += "}";

      std::string value;
      value += "{";
      value += "   \"last\" : \"dog\" ";
      value += "}";

      std::istringstream schemass(schema);
      std::istringstream valuess(value);

      avro::ValidSchema cpxSchema;
      avro::compileJsonSchema(schemass, cpxSchema);

      std::unique_ptr<avro::InputStream> json_is = avro::istreamInputStream(valuess);

      /* JSON decoder */
      avro::DecoderPtr json_decoder = avro::jsonDecoder(cpxSchema);
      avro::GenericDatum *datum = new avro::GenericDatum(cpxSchema);

      try
      {
         /* Decode JSON to Avro datum */
         json_decoder->init(*json_is);
         avro::decode(*json_decoder, *datum);
      }
      catch(const avro::Exception &_e)
      {
          // throws Incorrect token in the stream. Expected: Object start, found String
      }

Do I need to configure the system in a particular way for this to work, or does the current implementation simply not support these types of unions.

I sincerely hope someone can help!

All the best,

   Svend
Reply | Threaded
Open this post in threaded view
|

Re: (Nothing -> C++ Union Decoding)

svend frolund
Hello John,

Thanks for your reply! The example is home-brewed. I tried to construct a "minimal" example that illustrates my problem. As far as I understand the situation, the json object that I refer to as "value" ought to comply with the avro schema in "schema". Another value object were the "last" attribute is null should also comply with the schema. I thought the decoder would parse the "value" object against the schema, and determine that the actual type for the attribute "last" is indeed a string (as opposed to null), and decode the union as prescribed by the specification, by encoding a value of 1 (the position of "string" in the union) along with the actual string value. But are you telling me that it is expected behavior when the decoder is unable to decode the "value" object according to "schema"?

--Svend

On Fri, May 21, 2021 at 5:02 PM John McClean <[hidden email]> wrote:
From the spec:

A union is encoded by first writing an int value indicating the zero-based position within the union of the schema of its value. The value is then encoded per the indicated schema within the union.

In other words 'value' doesn't have a way for the decoder to figure out which type is in the union.

(Where are you getting this example from?)

J

On Fri, May 21, 2021 at 12:36 AM svend frolund <[hidden email]> wrote:
Hello,

I cannot seem to get avro union types to work properly in the c++ codebase that I pulled from your github repo a couple of weeks ago. I want to specify that an object attribute can be either null or a string in order to capture some notion of optional attributes in my json data. However, when decoding data that actually has a string value for the "optional" attribute in question, I get the following exception: "Incorrect token in the stream. Expected: Object start, found String". Here is a small program that replicates the issue:

      std::string schema;
      schema += "{";
      schema += "   \"name\" : \"simple\", ";
      schema += "   \"type\" : \"record\", ";
      schema += "   \"fields\" : [ { \"name\" : \"last\", \"type\" : [ \"null\", \"string\"] } ] ";
      schema += "}";

      std::string value;
      value += "{";
      value += "   \"last\" : \"dog\" ";
      value += "}";

      std::istringstream schemass(schema);
      std::istringstream valuess(value);

      avro::ValidSchema cpxSchema;
      avro::compileJsonSchema(schemass, cpxSchema);

      std::unique_ptr<avro::InputStream> json_is = avro::istreamInputStream(valuess);

      /* JSON decoder */
      avro::DecoderPtr json_decoder = avro::jsonDecoder(cpxSchema);
      avro::GenericDatum *datum = new avro::GenericDatum(cpxSchema);

      try
      {
         /* Decode JSON to Avro datum */
         json_decoder->init(*json_is);
         avro::decode(*json_decoder, *datum);
      }
      catch(const avro::Exception &_e)
      {
          // throws Incorrect token in the stream. Expected: Object start, found String
      }

Do I need to configure the system in a particular way for this to work, or does the current implementation simply not support these types of unions.

I sincerely hope someone can help!

All the best,

   Svend
Reply | Threaded
Open this post in threaded view
|

Re:

Scott Reynolds
In reply to this post by svend frolund
This confuses people often with Avro and Json Decoding. Json has almost 0 type information and therefore, when decoding JSON information for a Union schema type, Avro's Json Decoder *requires* a JSON Object detailing the type chosen. I have updated your code to demonstrate this.

Here is an explanation from Doug: http://mail-archives.apache.org/mod_mbox/avro-user/201412.mbox/%3CCALEq1Z-sKNT-fBpMhAa%3DGTjLq5wuKf5mAuvLYos4Ba17hUi%2Bfw%40mail.gmail.com%3E

and here is this information in the spec: http://avro.apache.org/docs/current/spec.html#json_encoding

#+BEGIN_SRC c++
      std::string schema;
      schema += "{";
      schema += "   \"name\" : \"simple\", ";
      schema += "   \"type\" : \"record\", ";
      schema += "   \"fields\" : [ { \"name\" : \"last\", \"type\" : [ \"null\", \"string\"] } ] ";
      schema += "}";

      std::string value;
      value += "{";
      value += "   \"last\" : {\"string\": \"dog\" }";
      value += "}";

      std::istringstream schemass(schema);
      std::istringstream valuess(value);

      avro::ValidSchema cpxSchema;
      avro::compileJsonSchema(schemass, cpxSchema);

      std::unique_ptr<avro::InputStream> json_is = avro::istreamInputStream(valuess);

      /* JSON decoder */
      avro::DecoderPtr json_decoder = avro::jsonDecoder(cpxSchema);
      avro::GenericDatum *datum = new avro::GenericDatum(cpxSchema);

      try
      {
         /* Decode JSON to Avro datum */
         json_decoder->init(*json_is);
         avro::decode(*json_decoder, *datum);
      }
      catch(const avro::Exception &_e)
      {
          // throws Incorrect token in the stream. Expected: Object start, found String
      }
#+END_SRC

On Fri, May 21, 2021 at 12:36 AM svend frolund <[hidden email]> wrote:
Hello,

I cannot seem to get avro union types to work properly in the c++ codebase that I pulled from your github repo a couple of weeks ago. I want to specify that an object attribute can be either null or a string in order to capture some notion of optional attributes in my json data. However, when decoding data that actually has a string value for the "optional" attribute in question, I get the following exception: "Incorrect token in the stream. Expected: Object start, found String". Here is a small program that replicates the issue:

      std::string schema;
      schema += "{";
      schema += "   \"name\" : \"simple\", ";
      schema += "   \"type\" : \"record\", ";
      schema += "   \"fields\" : [ { \"name\" : \"last\", \"type\" : [ \"null\", \"string\"] } ] ";
      schema += "}";

      std::string value;
      value += "{";
      value += "   \"last\" : \"dog\" ";
      value += "}";

      std::istringstream schemass(schema);
      std::istringstream valuess(value);

      avro::ValidSchema cpxSchema;
      avro::compileJsonSchema(schemass, cpxSchema);

      std::unique_ptr<avro::InputStream> json_is = avro::istreamInputStream(valuess);

      /* JSON decoder */
      avro::DecoderPtr json_decoder = avro::jsonDecoder(cpxSchema);
      avro::GenericDatum *datum = new avro::GenericDatum(cpxSchema);

      try
      {
         /* Decode JSON to Avro datum */
         json_decoder->init(*json_is);
         avro::decode(*json_decoder, *datum);
      }
      catch(const avro::Exception &_e)
      {
          // throws Incorrect token in the stream. Expected: Object start, found String
      }

Do I need to configure the system in a particular way for this to work, or does the current implementation simply not support these types of unions.

I sincerely hope someone can help!

All the best,

   Svend
Reply | Threaded
Open this post in threaded view
|

Re:

svend frolund
Hello Scott,

This makes sense. Thanks a lot for the explanation!

--Svend

On Fri, May 21, 2021 at 6:33 PM Scott Reynolds <[hidden email]> wrote:
This confuses people often with Avro and Json Decoding. Json has almost 0 type information and therefore, when decoding JSON information for a Union schema type, Avro's Json Decoder *requires* a JSON Object detailing the type chosen. I have updated your code to demonstrate this.

Here is an explanation from Doug: http://mail-archives.apache.org/mod_mbox/avro-user/201412.mbox/%3CCALEq1Z-sKNT-fBpMhAa%3DGTjLq5wuKf5mAuvLYos4Ba17hUi%2Bfw%40mail.gmail.com%3E

and here is this information in the spec: http://avro.apache.org/docs/current/spec.html#json_encoding

#+BEGIN_SRC c++
      std::string schema;
      schema += "{";
      schema += "   \"name\" : \"simple\", ";
      schema += "   \"type\" : \"record\", ";
      schema += "   \"fields\" : [ { \"name\" : \"last\", \"type\" : [ \"null\", \"string\"] } ] ";
      schema += "}";

      std::string value;
      value += "{";
      value += "   \"last\" : {\"string\": \"dog\" }";
      value += "}";

      std::istringstream schemass(schema);
      std::istringstream valuess(value);

      avro::ValidSchema cpxSchema;
      avro::compileJsonSchema(schemass, cpxSchema);

      std::unique_ptr<avro::InputStream> json_is = avro::istreamInputStream(valuess);

      /* JSON decoder */
      avro::DecoderPtr json_decoder = avro::jsonDecoder(cpxSchema);
      avro::GenericDatum *datum = new avro::GenericDatum(cpxSchema);

      try
      {
         /* Decode JSON to Avro datum */
         json_decoder->init(*json_is);
         avro::decode(*json_decoder, *datum);
      }
      catch(const avro::Exception &_e)
      {
          // throws Incorrect token in the stream. Expected: Object start, found String
      }
#+END_SRC

On Fri, May 21, 2021 at 12:36 AM svend frolund <[hidden email]> wrote:
Hello,

I cannot seem to get avro union types to work properly in the c++ codebase that I pulled from your github repo a couple of weeks ago. I want to specify that an object attribute can be either null or a string in order to capture some notion of optional attributes in my json data. However, when decoding data that actually has a string value for the "optional" attribute in question, I get the following exception: "Incorrect token in the stream. Expected: Object start, found String". Here is a small program that replicates the issue:

      std::string schema;
      schema += "{";
      schema += "   \"name\" : \"simple\", ";
      schema += "   \"type\" : \"record\", ";
      schema += "   \"fields\" : [ { \"name\" : \"last\", \"type\" : [ \"null\", \"string\"] } ] ";
      schema += "}";

      std::string value;
      value += "{";
      value += "   \"last\" : \"dog\" ";
      value += "}";

      std::istringstream schemass(schema);
      std::istringstream valuess(value);

      avro::ValidSchema cpxSchema;
      avro::compileJsonSchema(schemass, cpxSchema);

      std::unique_ptr<avro::InputStream> json_is = avro::istreamInputStream(valuess);

      /* JSON decoder */
      avro::DecoderPtr json_decoder = avro::jsonDecoder(cpxSchema);
      avro::GenericDatum *datum = new avro::GenericDatum(cpxSchema);

      try
      {
         /* Decode JSON to Avro datum */
         json_decoder->init(*json_is);
         avro::decode(*json_decoder, *datum);
      }
      catch(const avro::Exception &_e)
      {
          // throws Incorrect token in the stream. Expected: Object start, found String
      }

Do I need to configure the system in a particular way for this to work, or does the current implementation simply not support these types of unions.

I sincerely hope someone can help!

All the best,

   Svend
Reply | Threaded
Open this post in threaded view
|

Re: Re:

Brian Mcqueen
In reply to this post by Scott Reynolds

Yes, thanks for this.  I had this problem with the c library too.  I'm glad to see a recommendation and some links.

 

 

 

From: Scott Reynolds <[hidden email]>
Date: Friday, May 21, 2021 at 9:32 AM
To: [hidden email] <[hidden email]>
Subject: Re:

This confuses people often with Avro and Json Decoding. Json has almost 0 type information and therefore, when decoding JSON information for a Union schema type, Avro's Json Decoder *requires* a JSON Object detailing the type chosen. I have updated your code to demonstrate this.

Here is an explanation from Doug: http://mail-archives.apache.org/mod_mbox/avro-user/201412.mbox/%3CCALEq1Z-sKNT-fBpMhAa%3DGTjLq5wuKf5mAuvLYos4Ba17hUi%2Bfw%40mail.gmail.com%3E

and here is this information in the spec: http://avro.apache.org/docs/current/spec.html#json_encoding


#+BEGIN_SRC c++
      std::string schema;
      schema += "{";
      schema += "   \"name\" : \"simple\", ";
      schema += "   \"type\" : \"record\", ";
      schema += "   \"fields\" : [ { \"name\" : \"last\", \"type\" : [ \"null\", \"string\"] } ] ";
      schema += "}";

      std::string value;
      value += "{";
      value += "   \"last\" : {\"string\": \"dog\" }";
      value += "}";

      std::istringstream schemass(schema);
      std::istringstream valuess(value);

      avro::ValidSchema cpxSchema;
      avro::compileJsonSchema(schemass, cpxSchema);

      std::unique_ptr<avro::InputStream> json_is = avro::istreamInputStream(valuess);

      /* JSON decoder */
      avro::DecoderPtr json_decoder = avro::jsonDecoder(cpxSchema);
      avro::GenericDatum *datum = new avro::GenericDatum(cpxSchema);

      try
      {
         /* Decode JSON to Avro datum */
         json_decoder->init(*json_is);
         avro::decode(*json_decoder, *datum);
      }
      catch(const avro::Exception &_e)
      {
          // throws Incorrect token in the stream. Expected: Object start, found String
      }
#+END_SRC

 

On Fri, May 21, 2021 at 12:36 AM svend frolund <[hidden email]> wrote:

Hello,

 

I cannot seem to get avro union types to work properly in the c++ codebase that I pulled from your github repo a couple of weeks ago. I want to specify that an object attribute can be either null or a string in order to capture some notion of optional attributes in my json data. However, when decoding data that actually has a string value for the "optional" attribute in question, I get the following exception: "Incorrect token in the stream. Expected: Object start, found String". Here is a small program that replicates the issue:

 

      std::string schema;
      schema += "{";
      schema += "   \"name\" : \"simple\", ";
      schema += "   \"type\" : \"record\", ";
      schema += "   \"fields\" : [ { \"name\" : \"last\", \"type\" : [ \"null\", \"string\"] } ] ";
      schema += "}";

      std::string value;
      value += "{";
      value += "   \"last\" : \"dog\" ";
      value += "}";

      std::istringstream schemass(schema);
      std::istringstream valuess(value);

      avro::ValidSchema cpxSchema;
      avro::compileJsonSchema(schemass, cpxSchema);

      std::unique_ptr<avro::InputStream> json_is = avro::istreamInputStream(valuess);

      /* JSON decoder */
      avro::DecoderPtr json_decoder = avro::jsonDecoder(cpxSchema);
      avro::GenericDatum *datum = new avro::GenericDatum(cpxSchema);

      try
      {
         /* Decode JSON to Avro datum */
         json_decoder->init(*json_is);
         avro::decode(*json_decoder, *datum);
      }
      catch(const avro::Exception &_e)
      {

          // throws Incorrect token in the stream. Expected: Object start, found String
      }

 

Do I need to configure the system in a particular way for this to work, or does the current implementation simply not support these types of unions.

 

I sincerely hope someone can help!

 

All the best,

 

   Svend