Avro schema properties contention on multithread read

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Avro schema properties contention on multithread read

Fady
Hello,

We are using Avro Schema properties and while running concurrent tests,
we noticed a lot of contentions on
org.apache.avro.JsonProperties#getJsonProp.

In the attached screen shot, we have 4 concurrent threads all sharing
the same avro schema and reading from it simultaneously.

On this screen shot each red period is a contention between threads.
Most of these contentions are on getJsonProp.

This is due to getJsonProp being a synchronized method.

We have tried avro 1.7.7, 1.8.1 and 1.8.2. All have this problem
(getJsonProp is deprecated in 1.8 but the replacement method is also
synchronized).

We can work around this by not sharing the avro schemas between threads
(using ThreadLocal for instance) but this is ugly.

It seems that avro schemas are mostly immutable, which is great for
multithread read access, but it turns out Properties within these
schemas are mutable and, since they are stored in a LinkedHashMap,
synchronization is necessary.

Anyone having a similar issue?

Thank you

pastedImage.png (180K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Avro schema properties contention on multithread read

Zoltan Farkas-2
The synchronization in JsonProperties is curently inconsistent (see getObjectProps()) which makes current implementation @NotThreadSafe

I think it would be probably best to remove synchronization from those methods… and add @NotThreadSafe to the class…
Utilities like Schemas.synchronizedSchema(…) and Schemas.unmodifiableSchema(…) could be added to help with various use cases…


—Z


On Jun 29, 2017, at 2:21 AM, [hidden email] wrote:

Hello,

We are using Avro Schema properties and while running concurrent tests, we noticed a lot of contentions on org.apache.avro.JsonProperties#getJsonProp.

In the attached screen shot, we have 4 concurrent threads all sharing the same avro schema and reading from it simultaneously.

On this screen shot each red period is a contention between threads. Most of these contentions are on getJsonProp.

This is due to getJsonProp being a synchronized method.

We have tried avro 1.7.7, 1.8.1 and 1.8.2. All have this problem (getJsonProp is deprecated in 1.8 but the replacement method is also synchronized).

We can work around this by not sharing the avro schemas between threads (using ThreadLocal for instance) but this is ugly.

It seems that avro schemas are mostly immutable, which is great for multithread read access, but it turns out Properties within these schemas are mutable and, since they are stored in a LinkedHashMap, synchronization is necessary.

Anyone having a similar issue?

Thank you<pastedImage.png>

Reply | Threaded
Open this post in threaded view
|

Re: Avro schema properties contention on multithread read

Fady

On 05.07.2017 21:53, Zoltan Farkas wrote:

The synchronization in JsonProperties is curently inconsistent (see getObjectProps()) which makes current implementation @NotThreadSafe
 
I think it would be probably best to remove synchronization from those methods... and add @NotThreadSafe to the class...
Utilities like Schemas.synchronizedSchema(...) and Schemas.unmodifiableSchema(...) could be added to help with various use cases...
 
 
—Z
 

Thank you for your reply. I like your Schemas.unmodifiableSchema(...) a lot.

While what you are describing would be ideal, a simpler solution might be to change the LinkedHashMap that backs jsonProperties into something like a ConcurrentHashMap, avoiding the need for synchronization.

This being said ConcurrentHashMap itself does not preserve insertion order, so its not a mere replacement to LinkedHashMap.

Reply | Threaded
Open this post in threaded view
|

Re: Avro schema properties contention on multithread read

Zoltan Farkas-2
The order of attributes in Json might matter as far as I can remember, so LinkedHashMap might not be replaceable with a concurrenthashmap.
Plus concurrenthashmap is not exactly without concurrency overhead…
But you would have to use it it conjunction with a unsynchronized avro implementation. (which I do in my fork, and you can do as well).

I wonder if there is interest in merging this into the avro lib someday.

—Z


On Jul 6, 2017, at 12:20 PM, [hidden email] wrote:

On 05.07.2017 21:53, Zoltan Farkas wrote:

The synchronization in JsonProperties is curently inconsistent (see getObjectProps()) which makes current implementation @NotThreadSafe
 
I think it would be probably best to remove synchronization from those methods... and add @NotThreadSafe to the class...
Utilities like Schemas.synchronizedSchema(...) and Schemas.unmodifiableSchema(...) could be added to help with various use cases...
 
 
—Z
 

Thank you for your reply. I like your Schemas.unmodifiableSchema(...) a lot.

While what you are describing would be ideal, a simpler solution might be to change the LinkedHashMap that backs jsonProperties into something like a ConcurrentHashMap, avoiding the need for synchronization.

This being said ConcurrentHashMap itself does not preserve insertion order, so its not a mere replacement to LinkedHashMap.