Introducing SIREn, a plugin for richly nested data search on ElasticSearch

renaud1 · July 24, 2014, 10:44am

One of the coolest and most celebrated features of Elasticsearch is its
ability to index JSON in what we know to be a “quasi schemaless” fashion.
Elasticsearch does this by automatically flattening fields, whenever
possible, while resorting to nested field (“Blockjoin”) when objects are
truly nested.

While this works well for small documents and document collections, it
becomes unsustainable for larger ones: Blockjoin works by splitting the
original document in many documents, one per nested record. For example, a
single USPTO patent (XML format converted to JSON) will end up being over
1500 documents in the index. This has implications on performance and
scalability.

Introducing SIREn

SIREn is an open source plugin for Elasticsearch that enhances search over
nested data. SIREn uses a sophisticated "tree indexing" design which
ensures that the index is not artificially inflated. This ensures that
querying on many types of nested queries can be up to 3x faster. Further,
depending on the data, memory requirements for faceting can be up to 10x
higher. As such, SIREn allows you to use Elasticsearch for larger and more
complex datasets, especially so for sophisticated analytics. (You can read
our whitepaper to find out more [1])

SIREn is also truly schemaless - it even allows you to change the type of a
property between documents without being restricted by a defined mapping.
This can be very useful for data integration scenarios where data is
described in different ways in different sources.

You only need a few minutes to download and try SIREn [2]. It comes with a
detailed manual [3] and you have access to the code on GitHub [4].

We look forward to hear about your feedbacks.

[1]
http://siren.solutions/siren/resources/whitepapers/comparing-siren-1-2-and-lucenes-blockjoin-performance-a-uspto-patent-search-scenario/
[2] http://siren.solutions/siren/downloads/
[3] http://siren.solutions/manual/preface.html
[4] https://github.com/sindicetech/siren

Renaud Delbru
CTO
SIREn Solutions

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7a71fab2-f79d-46de-b443-8a01a818f7c4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
SIREn plugin for nested documents Elasticsearch	6	845	July 6, 2017
Elasticsearch join on indexes using SIREN Elasticsearch	1	401	July 12, 2020
Implementing a plugin to process the whole input document Elasticsearch	11	538	July 6, 2017
Elasticsearch support join on tables or not? Elasticsearch	11	3210	February 17, 2018
Deep nesting and recommendation for its usage Elasticsearch	4	1802	July 10, 2019

Introducing SIREn, a plugin for richly nested data search on ElasticSearch

[1] http://siren.solutions/siren/resources/whitepapers/comparing-siren-1-2-and-lucenes-blockjoin-performance-a-uspto-patent-search-scenario/ [2] http://siren.solutions/siren/downloads/ [3] http://siren.solutions/manual/preface.html [4] https://github.com/sindicetech/siren

Related topics

[1]
http://siren.solutions/siren/resources/whitepapers/comparing-siren-1-2-and-lucenes-blockjoin-performance-a-uspto-patent-search-scenario/
[2] http://siren.solutions/siren/downloads/
[3] http://siren.solutions/manual/preface.html
[4] https://github.com/sindicetech/siren