[GitHub] avro pull request #230: Ruby encoding performance improvements

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[GitHub] avro pull request #230: Ruby encoding performance improvements

GitHub user tjwp opened a pull request:


    Ruby encoding performance improvements

    This change includes several optimizations of the validation performed during encoding using Ruby. For a use case with a few levels of nesting and unions in several places within the schema we saw a 5x improvement in encoding performance with these changes.
    The main changes are:
    1. Avoid the exhaustive validation of schemas in a union. Previously a datum was tested against all schemas in a union even though the failures were unused if a compatible schema was found. Now validation stops when the first compatible schema is found, but all failures are still available if there is no compatible type.
    2. Avoid the repeated validation of nested schemas. Previously, the datum was recursively validated against the schema prior to encoding. Then during encoding, each complex field (record, array, map, union) was recursively validated again. Thus each field was validated a number of times equal to its level of nesting plus one. This change introduces an option for validation not to recurse. Since encoding proceeds recursively, validation is instead performed as each level is encoded.
    0ther minor improvements:
    - delay creating error messages until they are required
    - use explicit instead of dynamic code (`&method(:is_a?)`)
    - additional use of constants
    The only additional tests in this change demonstrate that validation without recursion returns the same results for "simple" fields and no validation errors for complex fields that would require recursion.
    The updated methods for `Avro::Schema.validate` and `Avro::SchemaValidator.validate!` were implemented to take an options hash with the new `:recursive` option in anticipation of eventually being combined with logical type support (https://github.com/apache/avro/pull/116) which would specify whether the datum is already `:encoded`.
    These changes have been tested against:
      - 1.9.3-p551
      - 2.0.0-p648
      - 2.1.10
      - 2.2.7
      - 2.3.4
      - 2.4.1

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/salsify/avro ruby-validation-perf

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #230
commit 97b350457b74a4b79b591f4e3d9b439a347fc5d7
Author: Tim Perkins <[hidden email]>
Date:   2017-06-12T16:34:59Z

    Ruby encoding performance improvements


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.