setting default values in avro

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

setting default values in avro

Sarvagya Pant
I am trying to implement Avro to replace some codes that tries to write data in CSV. This is because CSV cannot store the type of the field and all data are treated as string when trying to consume. I have copied the code for Avro from its website and would like to set a default value if there is no field.

My avro file looks like this:

{
    "type" : "record",
    "name" : "data",
    "namespace" : "my.example",
    "fields" : [
        {"name" : "domain", "type" : "string", "default" : "EMPTY"},
        {"name" : "ip", "type" : "string", "default" : "EMPTY"},
        {"name" : "port", "type" : "int", "default" : 0},
        {"name" : "score", "type" : "int", "default" : 0}
    ]
}

I have written a simple python file that is expected to work. It is given below:

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

schema = avro.schema.parse(open("data.avsc", "rb").read())

writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
writer.append({"ip": "1.2.3.4", "port" : 80})
writer.append({"domain": "another domain", "score" : 100})
writer.close()

reader = DataFileReader(open("users.avro", "rb"), DatumReader())
for data in reader:
    print data
reader.close()

However, if I try to run this program, I get error that data are not mapped according to schema.

    Traceback (most recent call last):
  File "D:\arko.py", line 8, in <module>
    writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
  File "build\bdist.win32\egg\avro\datafile.py", line 196, in append
  File "build\bdist.win32\egg\avro\io.py", line 769, in write

avro.io.AvroTypeException: The datum {'domain': 'hello domain', 'score': 20, 'port': 8080} is not an example of the schema {
  "namespace": "my.example",
  "type": "record",
  "name": "userInfo",
  "fields": [
    {
      "default": "EMPTY",
      "type": "string",
      "name": "domain"
    },
    {
      "default": "EMPTY",
      "type": "string",
      "name": "ip"
    },
    {
      "default": 0,
      "type": "int",
      "name": "port"
    },
    {
      "default": 0,
      "type": "int",
      "name": "score"
    }
  ]
}
[Finished in 0.1s with exit code 1]

I am using avro v1.8.0 and python 2.7. What am I doing wrong here? Thanks.

--
Sarvagya Pant
Kathmandu, Nepal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: setting default values in avro

Stanislav Savulchik
Hi,

I believe default values only work for readers, not writers.

default: A default value for this field, used when reading instances that lack this field (optional).

On 7 июля 2016 г., at 21:16, Sarvagya Pant <[hidden email]> wrote:

I am trying to implement Avro to replace some codes that tries to write data in CSV. This is because CSV cannot store the type of the field and all data are treated as string when trying to consume. I have copied the code for Avro from its website and would like to set a default value if there is no field.

My avro file looks like this:

{
    "type" : "record",
    "name" : "data",
    "namespace" : "my.example",
    "fields" : [
        {"name" : "domain", "type" : "string", "default" : "EMPTY"},
        {"name" : "ip", "type" : "string", "default" : "EMPTY"},
        {"name" : "port", "type" : "int", "default" : 0},
        {"name" : "score", "type" : "int", "default" : 0}
    ]
}

I have written a simple python file that is expected to work. It is given below:

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

schema = avro.schema.parse(open("data.avsc", "rb").read())

writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
writer.append({"ip": "1.2.3.4", "port" : 80})
writer.append({"domain": "another domain", "score" : 100})
writer.close()

reader = DataFileReader(open("users.avro", "rb"), DatumReader())
for data in reader:
    print data
reader.close()

However, if I try to run this program, I get error that data are not mapped according to schema.

    Traceback (most recent call last):
  File "D:\arko.py", line 8, in <module>
    writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
  File "build\bdist.win32\egg\avro\datafile.py", line 196, in append
  File "build\bdist.win32\egg\avro\io.py", line 769, in write

avro.io.AvroTypeException: The datum {'domain': 'hello domain', 'score': 20, 'port': 8080} is not an example of the schema {
  "namespace": "my.example",
  "type": "record",
  "name": "userInfo",
  "fields": [
    {
      "default": "EMPTY",
      "type": "string",
      "name": "domain"
    },
    {
      "default": "EMPTY",
      "type": "string",
      "name": "ip"
    },
    {
      "default": 0,
      "type": "int",
      "name": "port"
    },
    {
      "default": 0,
      "type": "int",
      "name": "score"
    }
  ]
}
[Finished in 0.1s with exit code 1]

I am using avro v1.8.0 and python 2.7. What am I doing wrong here? Thanks.

--
Sarvagya Pant
Kathmandu, Nepal

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: setting default values in avro

Sarvagya Pant
Hi Stanislav,

Thanks for the reply. What I want to achieve is that data arriving in Avro writer may not contain all field as specified in the example above. I would like to save default value if possible or retrieve the default value when using DataFileReader. Is this possible? Should the data always contain all the keys specified in the schema. I tried using ["int", "null"], "default" : 0, but this was able to save the data if any field is not present, but using DataFileReader I got None instead of default value 0. Any help will be much appreciated. Thanks.

On Thu, Jul 7, 2016 at 10:39 PM, Stanislav Savulchik <[hidden email]> wrote:
Hi,

I believe default values only work for readers, not writers.

default: A default value for this field, used when reading instances that lack this field (optional).

On 7 июля 2016 г., at 21:16, Sarvagya Pant <[hidden email]> wrote:

I am trying to implement Avro to replace some codes that tries to write data in CSV. This is because CSV cannot store the type of the field and all data are treated as string when trying to consume. I have copied the code for Avro from its website and would like to set a default value if there is no field.

My avro file looks like this:

{
    "type" : "record",
    "name" : "data",
    "namespace" : "my.example",
    "fields" : [
        {"name" : "domain", "type" : "string", "default" : "EMPTY"},
        {"name" : "ip", "type" : "string", "default" : "EMPTY"},
        {"name" : "port", "type" : "int", "default" : 0},
        {"name" : "score", "type" : "int", "default" : 0}
    ]
}

I have written a simple python file that is expected to work. It is given below:

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

schema = avro.schema.parse(open("data.avsc", "rb").read())

writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
writer.append({"ip": "1.2.3.4", "port" : 80})
writer.append({"domain": "another domain", "score" : 100})
writer.close()

reader = DataFileReader(open("users.avro", "rb"), DatumReader())
for data in reader:
    print data
reader.close()

However, if I try to run this program, I get error that data are not mapped according to schema.

    Traceback (most recent call last):
  File "D:\arko.py", line 8, in <module>
    writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
  File "build\bdist.win32\egg\avro\datafile.py", line 196, in append
  File "build\bdist.win32\egg\avro\io.py", line 769, in write

avro.io.AvroTypeException: The datum {'domain': 'hello domain', 'score': 20, 'port': 8080} is not an example of the schema {
  "namespace": "my.example",
  "type": "record",
  "name": "userInfo",
  "fields": [
    {
      "default": "EMPTY",
      "type": "string",
      "name": "domain"
    },
    {
      "default": "EMPTY",
      "type": "string",
      "name": "ip"
    },
    {
      "default": 0,
      "type": "int",
      "name": "port"
    },
    {
      "default": 0,
      "type": "int",
      "name": "score"
    }
  ]
}
[Finished in 0.1s with exit code 1]

I am using avro v1.8.0 and python 2.7. What am I doing wrong here? Thanks.

--
Sarvagya Pant
Kathmandu, Nepal




--
Sarvagya Pant
Kathmandu, Nepal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: setting default values in avro

Stanislav Savulchik
I'm not familiar with Avro good enough to propose an "Avro solution" for your problem :(

If you want to serialize default values into Avro for some fields you should provide the default values in code explicitly when writing to Avro. Another approach is to declare the fields as nullable using union types (e.g. [null, int]) and use default values in code explicitly when reading from Avro.

I believe the "default" key you used in Avro schema is meant for schema evolution http://avro.apache.org/docs/current/spec.html#Schema+Resolution 
  • if the reader's record schema has a field that contains a default value, and writer's schema does not have a field with the same name, then the reader should use the default value from its field.

пт, 8 июл. 2016 г. в 9:52, Sarvagya Pant <[hidden email]>:
Hi Stanislav,

Thanks for the reply. What I want to achieve is that data arriving in Avro writer may not contain all field as specified in the example above. I would like to save default value if possible or retrieve the default value when using DataFileReader. Is this possible? Should the data always contain all the keys specified in the schema. I tried using ["int", "null"], "default" : 0, but this was able to save the data if any field is not present, but using DataFileReader I got None instead of default value 0. Any help will be much appreciated. Thanks.

On Thu, Jul 7, 2016 at 10:39 PM, Stanislav Savulchik <[hidden email]> wrote:
Hi,

I believe default values only work for readers, not writers.

default: A default value for this field, used when reading instances that lack this field (optional).

On 7 июля 2016 г., at 21:16, Sarvagya Pant <[hidden email]> wrote:

I am trying to implement Avro to replace some codes that tries to write data in CSV. This is because CSV cannot store the type of the field and all data are treated as string when trying to consume. I have copied the code for Avro from its website and would like to set a default value if there is no field.

My avro file looks like this:

{
    "type" : "record",
    "name" : "data",
    "namespace" : "my.example",
    "fields" : [
        {"name" : "domain", "type" : "string", "default" : "EMPTY"},
        {"name" : "ip", "type" : "string", "default" : "EMPTY"},
        {"name" : "port", "type" : "int", "default" : 0},
        {"name" : "score", "type" : "int", "default" : 0}
    ]
}

I have written a simple python file that is expected to work. It is given below:

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

schema = avro.schema.parse(open("data.avsc", "rb").read())

writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
writer.append({"ip": "1.2.3.4", "port" : 80})
writer.append({"domain": "another domain", "score" : 100})
writer.close()

reader = DataFileReader(open("users.avro", "rb"), DatumReader())
for data in reader:
    print data
reader.close()

However, if I try to run this program, I get error that data are not mapped according to schema.

    Traceback (most recent call last):
  File "D:\arko.py", line 8, in <module>
    writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
  File "build\bdist.win32\egg\avro\datafile.py", line 196, in append
  File "build\bdist.win32\egg\avro\io.py", line 769, in write

avro.io.AvroTypeException: The datum {'domain': 'hello domain', 'score': 20, 'port': 8080} is not an example of the schema {
  "namespace": "my.example",
  "type": "record",
  "name": "userInfo",
  "fields": [
    {
      "default": "EMPTY",
      "type": "string",
      "name": "domain"
    },
    {
      "default": "EMPTY",
      "type": "string",
      "name": "ip"
    },
    {
      "default": 0,
      "type": "int",
      "name": "port"
    },
    {
      "default": 0,
      "type": "int",
      "name": "score"
    }
  ]
}
[Finished in 0.1s with exit code 1]

I am using avro v1.8.0 and python 2.7. What am I doing wrong here? Thanks.

--
Sarvagya Pant
Kathmandu, Nepal




--
Sarvagya Pant
Kathmandu, Nepal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: setting default values in avro

Yibing Shi
+ Sean Busbey

My understanding is this problem is a limitation of Python AVRO library. Currently it seems that the only valid default value is "null". Please try below schema to see whether it works for you.

{
    "type" : "record",
    "name" : "data",
    "namespace" : "my.example",
    "fields" : [
        {"name" : "domain", "type" : ["null", "string"], "default" : null},
        {"name" : "ip", "type" : ["null", "string"], "default" : null},
        {"name" : "port", "type" : ["null", "int"], "default" : null},
        {"name" : "score", "type" : ["null", "int"], "default" : null}
    ]
}

Below JIRAs seems to be related:


I am pretty sure that the AVRO Java library supports using a non-null default value for record fields. You can try it in a Java program.


Yibing Shi
Customer Operations Engineer


On Fri, Jul 8, 2016 at 3:00 PM, Stanislav Savulchik <[hidden email]> wrote:
I'm not familiar with Avro good enough to propose an "Avro solution" for your problem :(

If you want to serialize default values into Avro for some fields you should provide the default values in code explicitly when writing to Avro. Another approach is to declare the fields as nullable using union types (e.g. [null, int]) and use default values in code explicitly when reading from Avro.

I believe the "default" key you used in Avro schema is meant for schema evolution http://avro.apache.org/docs/current/spec.html#Schema+Resolution 
  • if the reader's record schema has a field that contains a default value, and writer's schema does not have a field with the same name, then the reader should use the default value from its field.

пт, 8 июл. 2016 г. в 9:52, Sarvagya Pant <[hidden email]>:
Hi Stanislav,

Thanks for the reply. What I want to achieve is that data arriving in Avro writer may not contain all field as specified in the example above. I would like to save default value if possible or retrieve the default value when using DataFileReader. Is this possible? Should the data always contain all the keys specified in the schema. I tried using ["int", "null"], "default" : 0, but this was able to save the data if any field is not present, but using DataFileReader I got None instead of default value 0. Any help will be much appreciated. Thanks.

On Thu, Jul 7, 2016 at 10:39 PM, Stanislav Savulchik <[hidden email]> wrote:
Hi,

I believe default values only work for readers, not writers.

default: A default value for this field, used when reading instances that lack this field (optional).

On 7 июля 2016 г., at 21:16, Sarvagya Pant <[hidden email]> wrote:

I am trying to implement Avro to replace some codes that tries to write data in CSV. This is because CSV cannot store the type of the field and all data are treated as string when trying to consume. I have copied the code for Avro from its website and would like to set a default value if there is no field.

My avro file looks like this:

{
    "type" : "record",
    "name" : "data",
    "namespace" : "my.example",
    "fields" : [
        {"name" : "domain", "type" : "string", "default" : "EMPTY"},
        {"name" : "ip", "type" : "string", "default" : "EMPTY"},
        {"name" : "port", "type" : "int", "default" : 0},
        {"name" : "score", "type" : "int", "default" : 0}
    ]
}

I have written a simple python file that is expected to work. It is given below:

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

schema = avro.schema.parse(open("data.avsc", "rb").read())

writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
writer.append({"ip": "1.2.3.4", "port" : 80})
writer.append({"domain": "another domain", "score" : 100})
writer.close()

reader = DataFileReader(open("users.avro", "rb"), DatumReader())
for data in reader:
    print data
reader.close()

However, if I try to run this program, I get error that data are not mapped according to schema.

    Traceback (most recent call last):
  File "D:\arko.py", line 8, in <module>
    writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
  File "build\bdist.win32\egg\avro\datafile.py", line 196, in append
  File "build\bdist.win32\egg\avro\io.py", line 769, in write

avro.io.AvroTypeException: The datum {'domain': 'hello domain', 'score': 20, 'port': 8080} is not an example of the schema {
  "namespace": "my.example",
  "type": "record",
  "name": "userInfo",
  "fields": [
    {
      "default": "EMPTY",
      "type": "string",
      "name": "domain"
    },
    {
      "default": "EMPTY",
      "type": "string",
      "name": "ip"
    },
    {
      "default": 0,
      "type": "int",
      "name": "port"
    },
    {
      "default": 0,
      "type": "int",
      "name": "score"
    }
  ]
}
[Finished in 0.1s with exit code 1]

I am using avro v1.8.0 and python 2.7. What am I doing wrong here? Thanks.

--
Sarvagya Pant
Kathmandu, Nepal




--
Sarvagya Pant
Kathmandu, Nepal

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: setting default values in avro

Arne Vogel
Dear Yibing Shi,

a default value for a union must have the schema which is the first union member. Therefore, to set e.g. an int default value, use ["int", "null"] instead of ["null", "int"].

For more details, see the spec:
http://avro.apache.org/docs/1.8.1/spec.html#schema_complex

Regards,
Arne Vogel

On 08.07.2016 14:51, Yibing Shi wrote:
+ Sean Busbey

My understanding is this problem is a limitation of Python AVRO library. Currently it seems that the only valid default value is "null". Please try below schema to see whether it works for you.

{
    "type" : "record",
    "name" : "data",
    "namespace" : "my.example",
    "fields" : [
        {"name" : "domain", "type" : ["null", "string"], "default" : null},
        {"name" : "ip", "type" : ["null", "string"], "default" : null},
        {"name" : "port", "type" : ["null", "int"], "default" : null},
        {"name" : "score", "type" : ["null", "int"], "default" : null}
    ]
}

Below JIRAs seems to be related:


I am pretty sure that the AVRO Java library supports using a non-null default value for record fields. You can try it in a Java program.


Yibing Shi
Customer Operations Engineer


On Fri, Jul 8, 2016 at 3:00 PM, Stanislav Savulchik <[hidden email]> wrote:
I'm not familiar with Avro good enough to propose an "Avro solution" for your problem :(

If you want to serialize default values into Avro for some fields you should provide the default values in code explicitly when writing to Avro. Another approach is to declare the fields as nullable using union types (e.g. [null, int]) and use default values in code explicitly when reading from Avro.

I believe the "default" key you used in Avro schema is meant for schema evolution http://avro.apache.org/docs/current/spec.html#Schema+Resolution 
  • if the reader's record schema has a field that contains a default value, and writer's schema does not have a field with the same name, then the reader should use the default value from its field.

пт, 8 июл. 2016 г. в 9:52, Sarvagya Pant <[hidden email]>:
Hi Stanislav,

Thanks for the reply. What I want to achieve is that data arriving in Avro writer may not contain all field as specified in the example above. I would like to save default value if possible or retrieve the default value when using DataFileReader. Is this possible? Should the data always contain all the keys specified in the schema. I tried using ["int", "null"], "default" : 0, but this was able to save the data if any field is not present, but using DataFileReader I got None instead of default value 0. Any help will be much appreciated. Thanks.

On Thu, Jul 7, 2016 at 10:39 PM, Stanislav Savulchik <[hidden email]> wrote:
Hi,

I believe default values only work for readers, not writers.

default: A default value for this field, used when reading instances that lack this field (optional).

On 7 июля 2016 г., at 21:16, Sarvagya Pant <[hidden email]> wrote:

I am trying to implement Avro to replace some codes that tries to write data in CSV. This is because CSV cannot store the type of the field and all data are treated as string when trying to consume. I have copied the code for Avro from its website and would like to set a default value if there is no field.

My avro file looks like this:

{
    "type" : "record",
    "name" : "data",
    "namespace" : "my.example",
    "fields" : [
        {"name" : "domain", "type" : "string", "default" : "EMPTY"},
        {"name" : "ip", "type" : "string", "default" : "EMPTY"},
        {"name" : "port", "type" : "int", "default" : 0},
        {"name" : "score", "type" : "int", "default" : 0}
    ]
}

I have written a simple python file that is expected to work. It is given below:

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

schema = avro.schema.parse(open("data.avsc", "rb").read())

writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
writer.append({"ip": "1.2.3.4", "port" : 80})
writer.append({"domain": "another domain", "score" : 100})
writer.close()

reader = DataFileReader(open("users.avro", "rb"), DatumReader())
for data in reader:
    print data
reader.close()

However, if I try to run this program, I get error that data are not mapped according to schema.

    Traceback (most recent call last):
  File "D:\arko.py", line 8, in <module>
    writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
  File "build\bdist.win32\egg\avro\datafile.py", line 196, in append
  File "build\bdist.win32\egg\avro\io.py", line 769, in write

avro.io.AvroTypeException: The datum {'domain': 'hello domain', 'score': 20, 'port': 8080} is not an example of the schema {
  "namespace": "my.example",
  "type": "record",
  "name": "userInfo",
  "fields": [
    {
      "default": "EMPTY",
      "type": "string",
      "name": "domain"
    },
    {
      "default": "EMPTY",
      "type": "string",
      "name": "ip"
    },
    {
      "default": 0,
      "type": "int",
      "name": "port"
    },
    {
      "default": 0,
      "type": "int",
      "name": "score"
    }
  ]
}
[Finished in 0.1s with exit code 1]

I am using avro v1.8.0 and python 2.7. What am I doing wrong here? Thanks.

--
Sarvagya Pant
Kathmandu, Nepal




--
Sarvagya Pant
Kathmandu, Nepal


-- 
BENOCS GMBH
Arne Vogel
Winterfeldtstr. 21
10781 Berlin
Email: [hidden email]
www.benocs.com

Board of Management: Michael Wolz, Dr.-Ing. Oliver Holschke, Dr.-Ing. Ingmar Poese
Commercial Register: Amtsgericht Bonn HRB 19378
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: setting default values in avro

Sarvagya Pant
I have tried both cases.
Case 1: using ["int", "null"], "default" : 0. In this case writing a sample data {"ip" : "1.1.1.1", "port" : 20} in DataFileWriter yeilded {u'ip': u'1.1.1.1', u'domain': None, u'score': None, u'port': 20} where I expected score to be 0 instead of None.
Case 2: using ["null", "int"], "default": null : Same as above case was seen.

Is this a limitation in Python library of avro? If so, can someone recommend me any other python based library similar to avro. Thanks.

On Fri, Jul 8, 2016 at 7:02 PM, Arne Vogel <[hidden email]> wrote:
Dear Yibing Shi,

a default value for a union must have the schema which is the first union member. Therefore, to set e.g. an int default value, use ["int", "null"] instead of ["null", "int"].

For more details, see the spec:
http://avro.apache.org/docs/1.8.1/spec.html#schema_complex

Regards,
Arne Vogel


On 08.07.2016 14:51, Yibing Shi wrote:
+ Sean Busbey

My understanding is this problem is a limitation of Python AVRO library. Currently it seems that the only valid default value is "null". Please try below schema to see whether it works for you.

{
    "type" : "record",
    "name" : "data",
    "namespace" : "my.example",
    "fields" : [
        {"name" : "domain", "type" : ["null", "string"], "default" : null},
        {"name" : "ip", "type" : ["null", "string"], "default" : null},
        {"name" : "port", "type" : ["null", "int"], "default" : null},
        {"name" : "score", "type" : ["null", "int"], "default" : null}
    ]
}

Below JIRAs seems to be related:


I am pretty sure that the AVRO Java library supports using a non-null default value for record fields. You can try it in a Java program.


Yibing Shi
Customer Operations Engineer


On Fri, Jul 8, 2016 at 3:00 PM, Stanislav Savulchik <[hidden email][hidden email]> wrote:
I'm not familiar with Avro good enough to propose an "Avro solution" for your problem :(

If you want to serialize default values into Avro for some fields you should provide the default values in code explicitly when writing to Avro. Another approach is to declare the fields as nullable using union types (e.g. [null, int]) and use default values in code explicitly when reading from Avro.

I believe the "default" key you used in Avro schema is meant for schema evolution http://avro.apache.org/docs/current/spec.html#Schema+Resolution 
  • if the reader's record schema has a field that contains a default value, and writer's schema does not have a field with the same name, then the reader should use the default value from its field.

пт, 8 июл. 2016 г. в 9:52, Sarvagya Pant <[hidden email]>:
Hi Stanislav,

Thanks for the reply. What I want to achieve is that data arriving in Avro writer may not contain all field as specified in the example above. I would like to save default value if possible or retrieve the default value when using DataFileReader. Is this possible? Should the data always contain all the keys specified in the schema. I tried using ["int", "null"], "default" : 0, but this was able to save the data if any field is not present, but using DataFileReader I got None instead of default value 0. Any help will be much appreciated. Thanks.

On Thu, Jul 7, 2016 at 10:39 PM, Stanislav Savulchik <[hidden email]> wrote:
Hi,

I believe default values only work for readers, not writers.

default: A default value for this field, used when reading instances that lack this field (optional).

On 7 июля 2016 г., at 21:16, Sarvagya Pant <[hidden email][hidden email]> wrote:

I am trying to implement Avro to replace some codes that tries to write data in CSV. This is because CSV cannot store the type of the field and all data are treated as string when trying to consume. I have copied the code for Avro from its website and would like to set a default value if there is no field.

My avro file looks like this:

{
    "type" : "record",
    "name" : "data",
    "namespace" : "my.example",
    "fields" : [
        {"name" : "domain", "type" : "string", "default" : "EMPTY"},
        {"name" : "ip", "type" : "string", "default" : "EMPTY"},
        {"name" : "port", "type" : "int", "default" : 0},
        {"name" : "score", "type" : "int", "default" : 0}
    ]
}

I have written a simple python file that is expected to work. It is given below:

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

schema = avro.schema.parse(open("data.avsc", "rb").read())

writer = DataFileWriter(open("users.avro", "w"), DatumWriter(), schema)
writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
writer.append({"ip": "1.2.3.4", "port" : 80})
writer.append({"domain": "another domain", "score" : 100})
writer.close()

reader = DataFileReader(open("users.avro", "rb"), DatumReader())
for data in reader:
    print data
reader.close()

However, if I try to run this program, I get error that data are not mapped according to schema.

    Traceback (most recent call last):
  File "D:\arko.py", line 8, in <module>
    writer.append({"domain": "hello domain", "score" : 20, "port" : 8080})
  File "build\bdist.win32\egg\avro\datafile.py", line 196, in append
  File "build\bdist.win32\egg\avro\io.py", line 769, in write

avro.io.AvroTypeException: The datum {'domain': 'hello domain', 'score': 20, 'port': 8080} is not an example of the schema {
  "namespace": "my.example",
  "type": "record",
  "name": "userInfo",
  "fields": [
    {
      "default": "EMPTY",
      "type": "string",
      "name": "domain"
    },
    {
      "default": "EMPTY",
      "type": "string",
      "name": "ip"
    },
    {
      "default": 0,
      "type": "int",
      "name": "port"
    },
    {
      "default": 0,
      "type": "int",
      "name": "score"
    }
  ]
}
[Finished in 0.1s with exit code 1]

I am using avro v1.8.0 and python 2.7. What am I doing wrong here? Thanks.

--
Sarvagya Pant
Kathmandu, Nepal




--
Sarvagya Pant
Kathmandu, Nepal


-- 
BENOCS GMBH
Arne Vogel
Winterfeldtstr. 21
10781 Berlin
Email: [hidden email]
www.benocs.com

Board of Management: Michael Wolz, Dr.-Ing. Oliver Holschke, Dr.-Ing. Ingmar Poese
Commercial Register: Amtsgericht Bonn HRB 19378



--
Sarvagya Pant
Kathmandu, Nepal
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: setting default values in avro

aqikhan
This post has NOT been accepted by the mailing list yet.
 {"name" : "domain", "type" : ["null", "string"], "default" : "NONE"},

Default value for string must be NONE as above.
Loading...