There was some discussion between one of the developers and Jorg a while
back, so I guess this is the outcome. Have not tried it yet, but I will
give it a shot this weekend. I am hoping that it can fix a longstanding
issue in Elasticsearch (and my biggest roadblock):
I noticed Siren has an example of 1000 library catalog records from British
Library prepared in JSON
From what it seems, Siren can index a tree (semi-structured data), using
positional nodes, then you can express a tree node DSL query in JSON, and
the result is something like a list of found node ids.
Regarding the "inner hits" challenge, this seems to get very close, because
a JSON doc is always semi-structured. The question is how to embed Siren
documents into Elasticsearch documents (or vice versa), i.e. can they
co-exist and queried by a single query, combining the power of both.
While this is interesting for nested hierarchical data models, I am
studying JSON-LD and graph search in ES, for being able to follow links
between docs (or even between ES docs and web resources, local or remote).
Jörg
On Wed, Jul 23, 2014 at 7:52 PM, Ivan Brusic ivan@brusic.com wrote:
There was some discussion between one of the developers and Jorg a while
back, so I guess this is the outcome. Have not tried it yet, but I will
give it a shot this weekend. I am hoping that it can fix a longstanding
issue in Elasticsearch (and my biggest roadblock): https://github.com/elasticsearch/elasticsearch/issues/3022
Thanks for the link. Unfortunately, Chrome on Mac OS (latest versions of
each) causes this web page to blank and redisplay continually. Can't read
it; hope you can.
In a previous life, I created a search engine that handled parent/child
relationships with blindingly fast performance. One trick was that the
index didn't just contain the document ID, but it contained the entire
hierarchy of IDs. So, for example (and brevity, the IDs are single letters):
Document ID and
relationship Fully qualified and indexed ID
A A
B A.B
C A.B.C
D A.D
E A.D.E
F A.D.F
So for example, it was nearly instantaneous to determine that, just by
looking at and comparing the fully qualified IDs:
A and F are in the same parent-child hierarchy, with F being a child of D
and a grandchild of A.
E and F are siblings under the same parent.
And so on.
Not sure how this would mesh with Lucene though. But complex parent-child
relationships could be intersected just by the fully qualified IDs that
came out of the inverted index. Documents did not need to be fetched or
cached to perform this operation, and the result was breathtakingly
blindingly fast performance.
Just FYI. I can discuss off-line if anyone wishes.
(P.S.: I am one of the developer of the SIREn plugin)
it would be possible for SIREn to support such functionality (but it is not
yet implemented), as each element / node in the tree as a unique identifier
that is retrieved at search time. Therefore, one could use this identifier
to fetch and filter the relevant element from the original JSON document.
In both stock Elasticsearch and SIREn case, the main problematic from what
I understand is that this would require a refactoring of the fetching phase
in Elasticsearch.
Kind Regards
Renaud Delbru
On Wednesday, July 23, 2014 6:53:00 PM UTC+1, Ivan Brusic wrote:
Our apologies for the issues with the web site, we had some problems on our
web server yesterday.
What you have described is very close to the indexing model in SIREn. SIREn
provides an optimised Lucene's Codec for such data structure, and provide
query operators on top of this data structure.
Kind Regards
Renaud Delbru
On Wednesday, July 23, 2014 7:39:04 PM UTC+1, Brian wrote:
Thanks for the link. Unfortunately, Chrome on Mac OS (latest versions of
each) causes this web page to blank and redisplay continually. Can't read
it; hope you can.
In a previous life, I created a search engine that handled parent/child
relationships with blindingly fast performance. One trick was that the
index didn't just contain the document ID, but it contained the entire
hierarchy of IDs. So, for example (and brevity, the IDs are single letters):
Document ID and
relationship Fully qualified and indexed ID
A A
B A.B
C A.B.C
D A.D
E A.D.E
F A.D.F
So for example, it was nearly instantaneous to determine that, just by
looking at and comparing the fully qualified IDs:
A and F are in the same parent-child hierarchy, with F being a child of D
and a grandchild of A.
E and F are siblings under the same parent.
And so on.
Not sure how this would mesh with Lucene though. But complex parent-child
relationships could be intersected just by the fully qualified IDs that
came out of the inverted index. Documents did not need to be fetched or
cached to perform this operation, and the result was breathtakingly
blindingly fast performance.
Just FYI. I can discuss off-line if anyone wishes.
Our apologies for the issues with the web site, we had some problems on
our web server yesterday.
What you have described is very close to the indexing model in SIREn.
SIREn provides an optimised Lucene's Codec for such data structure, and
provide query operators on top of this data structure.
Kind Regards
Renaud Delbru
On Wednesday, July 23, 2014 7:39:04 PM UTC+1, Brian wrote:
Thanks for the link. Unfortunately, Chrome on Mac OS (latest versions of
each) causes this web page to blank and redisplay continually. Can't read
it; hope you can.
In a previous life, I created a search engine that handled parent/child
relationships with blindingly fast performance. One trick was that the
index didn't just contain the document ID, but it contained the entire
hierarchy of IDs. So, for example (and brevity, the IDs are single letters):
Document ID and
relationship Fully qualified and indexed ID
A A
B A.B
C A.B.C
D A.D
E A.D.E
F A.D.F
So for example, it was nearly instantaneous to determine that, just by
looking at and comparing the fully qualified IDs:
A and F are in the same parent-child hierarchy, with F being a child of D
and a grandchild of A.
E and F are siblings under the same parent.
And so on.
Not sure how this would mesh with Lucene though. But complex parent-child
relationships could be intersected just by the fully qualified IDs that
came out of the inverted index. Documents did not need to be fetched or
cached to perform this operation, and the result was breathtakingly
blindingly fast performance.
Just FYI. I can discuss off-line if anyone wishes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.