SIREn plugin for nested documents


(Ivan Brusic) #1

Has anyone else seen this plugin? http://siren.solutions/siren/overview/

There was some discussion between one of the developers and Jorg a while
back, so I guess this is the outcome. Have not tried it yet, but I will
give it a shot this weekend. I am hoping that it can fix a longstanding
issue in Elasticsearch (and my biggest roadblock):

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA3-NCDz%2B-gzAd74Pq3-kiGTvEZDW_L-uuhRG6V_-BSvg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

I noticed Siren has an example of 1000 library catalog records from British
Library prepared in JSON

From what it seems, Siren can index a tree (semi-structured data), using
positional nodes, then you can express a tree node DSL query in JSON, and
the result is something like a list of found node ids.

Regarding the "inner hits" challenge, this seems to get very close, because
a JSON doc is always semi-structured. The question is how to embed Siren
documents into Elasticsearch documents (or vice versa), i.e. can they
co-exist and queried by a single query, combining the power of both.

While this is interesting for nested hierarchical data models, I am
studying JSON-LD and graph search in ES, for being able to follow links
between docs (or even between ES docs and web resources, local or remote).

Jörg

On Wed, Jul 23, 2014 at 7:52 PM, Ivan Brusic ivan@brusic.com wrote:

Has anyone else seen this plugin? http://siren.solutions/siren/overview/

There was some discussion between one of the developers and Jorg a while
back, so I guess this is the outcome. Have not tried it yet, but I will
give it a shot this weekend. I am hoping that it can fix a longstanding
issue in Elasticsearch (and my biggest roadblock):
https://github.com/elasticsearch/elasticsearch/issues/3022

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA3-NCDz%2B-gzAd74Pq3-kiGTvEZDW_L-uuhRG6V_-BSvg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA3-NCDz%2B-gzAd74Pq3-kiGTvEZDW_L-uuhRG6V_-BSvg%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEue3WsZ0h-Ud0y2Z7oY2gp3mo6iWv84DnygCPVibVRRw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #3

Thanks for the link. Unfortunately, Chrome on Mac OS (latest versions of
each) causes this web page to blank and redisplay continually. Can't read
it; hope you can.

In a previous life, I created a search engine that handled parent/child
relationships with blindingly fast performance. One trick was that the
index didn't just contain the document ID, but it contained the entire
hierarchy of IDs. So, for example (and brevity, the IDs are single letters):

Document ID and
relationship Fully qualified and indexed ID


A A
B A.B
C A.B.C
D A.D
E A.D.E
F A.D.F

So for example, it was nearly instantaneous to determine that, just by
looking at and comparing the fully qualified IDs:

A and F are in the same parent-child hierarchy, with F being a child of D
and a grandchild of A.

E and F are siblings under the same parent.

And so on.

Not sure how this would mesh with Lucene though. But complex parent-child
relationships could be intersected just by the fully qualified IDs that
came out of the inverted index. Documents did not need to be fetched or
cached to perform this operation, and the result was breathtakingly
blindingly fast performance.

Just FYI. I can discuss off-line if anyone wishes.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5b6ef1ce-3daf-4de5-b106-710fd306863d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(renaud) #4

Hi Ivan,

(P.S.: I am one of the developer of the SIREn plugin)

it would be possible for SIREn to support such functionality (but it is not
yet implemented), as each element / node in the tree as a unique identifier
that is retrieved at search time. Therefore, one could use this identifier
to fetch and filter the relevant element from the original JSON document.
In both stock Elasticsearch and SIREn case, the main problematic from what
I understand is that this would require a refactoring of the fetching phase
in Elasticsearch.

Kind Regards

Renaud Delbru

On Wednesday, July 23, 2014 6:53:00 PM UTC+1, Ivan Brusic wrote:

Has anyone else seen this plugin? http://siren.solutions/siren/overview/

There was some discussion between one of the developers and Jorg a while
back, so I guess this is the outcome. Have not tried it yet, but I will
give it a shot this weekend. I am hoping that it can fix a longstanding
issue in Elasticsearch (and my biggest roadblock):
https://github.com/elasticsearch/elasticsearch/issues/3022
https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Felasticsearch%2Felasticsearch%2Fissues%2F3022&sa=D&sntz=1&usg=AFQjCNHb81aW_g4_iKjdslUFiWyBdJEOkQ

Cheers,

Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/40565ad1-d50b-485c-9889-0637a8c78847%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(renaud) #5

Hi Brian,

Our apologies for the issues with the web site, we had some problems on our
web server yesterday.

What you have described is very close to the indexing model in SIREn. SIREn
provides an optimised Lucene's Codec for such data structure, and provide
query operators on top of this data structure.

Kind Regards

Renaud Delbru

On Wednesday, July 23, 2014 7:39:04 PM UTC+1, Brian wrote:

Thanks for the link. Unfortunately, Chrome on Mac OS (latest versions of
each) causes this web page to blank and redisplay continually. Can't read
it; hope you can.

In a previous life, I created a search engine that handled parent/child
relationships with blindingly fast performance. One trick was that the
index didn't just contain the document ID, but it contained the entire
hierarchy of IDs. So, for example (and brevity, the IDs are single letters):

Document ID and
relationship Fully qualified and indexed ID


A A
B A.B
C A.B.C
D A.D
E A.D.E
F A.D.F

So for example, it was nearly instantaneous to determine that, just by
looking at and comparing the fully qualified IDs:

A and F are in the same parent-child hierarchy, with F being a child of D
and a grandchild of A.

E and F are siblings under the same parent.

And so on.

Not sure how this would mesh with Lucene though. But complex parent-child
relationships could be intersected just by the fully qualified IDs that
came out of the inverted index. Documents did not need to be fetched or
cached to perform this operation, and the result was breathtakingly
blindingly fast performance.

Just FYI. I can discuss off-line if anyone wishes.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/486046a9-8edf-452f-97a2-2a4fab58f390%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #6

Thanks for chiming in Renaud. Hopefully I will have a chance to test out
the plugin soon. My use case for nested documents is fairly simple.

--
Ivan

On Thu, Jul 24, 2014 at 4:00 AM, renaud@sindicetech.com wrote:

Hi Brian,

Our apologies for the issues with the web site, we had some problems on
our web server yesterday.

What you have described is very close to the indexing model in SIREn.
SIREn provides an optimised Lucene's Codec for such data structure, and
provide query operators on top of this data structure.

Kind Regards

Renaud Delbru

On Wednesday, July 23, 2014 7:39:04 PM UTC+1, Brian wrote:

Thanks for the link. Unfortunately, Chrome on Mac OS (latest versions of
each) causes this web page to blank and redisplay continually. Can't read
it; hope you can.

In a previous life, I created a search engine that handled parent/child
relationships with blindingly fast performance. One trick was that the
index didn't just contain the document ID, but it contained the entire
hierarchy of IDs. So, for example (and brevity, the IDs are single letters):

Document ID and
relationship Fully qualified and indexed ID


A A
B A.B
C A.B.C
D A.D
E A.D.E
F A.D.F

So for example, it was nearly instantaneous to determine that, just by
looking at and comparing the fully qualified IDs:

A and F are in the same parent-child hierarchy, with F being a child of D
and a grandchild of A.

E and F are siblings under the same parent.

And so on.

Not sure how this would mesh with Lucene though. But complex parent-child
relationships could be intersected just by the fully qualified IDs that
came out of the inverted index. Documents did not need to be fetched or
cached to perform this operation, and the result was breathtakingly
blindingly fast performance.

Just FYI. I can discuss off-line if anyone wishes.

Brian

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/486046a9-8edf-452f-97a2-2a4fab58f390%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/486046a9-8edf-452f-97a2-2a4fab58f390%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDUw_WNwDo6VcFzTBP%3Dwk8R2A5Xa3n40_By0QeyafZPBg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7