We store bibliographic metadata in ES and it's working fine. However, due
to our update process, we store multiple documents per book - basically we
add a new document with a custom timestamp every time something gets
updated (outside ES).
Is there a way to perform queries, but restrict the searched set of
documents only to the latest version?
To illustrate it a bit more: E.g. we have a field that represent the ISBN -
which is semantically unique. Now there are like e.g. three documents with
this ISBN but with different timestamps (representing some change in the
associated metadata). What we would like to do is to restrict the actual
search (or filter or any operation you would normally perform) to the
document with latest timestamp.
I looked into scripting and scripting filters, but it seems, that the
scripting filter operates on a single document, so that I can't say
something like: for each match, throw out all documents but the one with
the maximum timestamp.
Is there a way to perform queries, but restrict the searched set of
documents only to the latest version?
Not that I can think of.
I looked into scripting and scripting filters, but it seems, that the
scripting filter operates on a single document, so that I can't say
something like: for each match, throw out all documents but the one
with the maximum timestamp.
Correct - queries and filters operate on single docs.
I would suggest adding a 'latest' flag which you then manually turn off
on the no-longer-latest version
On Sunday, 10 June 2012 15:09:00 UTC+2, Clinton Gormley wrote:
Hiya
Is there a way to perform queries, but restrict the searched set of
documents only to the latest version?
Not that I can think of.
I looked into scripting and scripting filters, but it seems, that the
scripting filter operates on a single document, so that I can't say
something like: for each match, throw out all documents but the one
with the maximum timestamp.
Correct - queries and filters operate on single docs.
I would suggest adding a 'latest' flag which you then manually turn off
on the no-longer-latest version
Ok, I see. The trouble with this would be, that we would have to check for
a previous version every time we add a document - (conceptually this is not
unreasonable) - just to a bit too heavy in terms of performance for us at
the moment. Anyway, I could imagine a periodic task, that sifts through all
docs and removes/places a "latest" flag.
On Sunday, 10 June 2012 15:09:00 UTC+2, Clinton Gormley wrote:
Hiya
Is there a way to perform queries, but restrict the searched set of
documents only to the latest version?
Not that I can think of.
I looked into scripting and scripting filters, but it seems, that the
scripting filter operates on a single document, so that I can't say
something like: for each match, throw out all documents but the one
with the maximum timestamp.
Correct - queries and filters operate on single docs.
I would suggest adding a 'latest' flag which you then manually turn off
on the no-longer-latest version
Ok, I see. The trouble with this would be, that we would have to check for
a previous version every time we add a document - (conceptually this is not
unreasonable) - just to a bit too heavy in terms of performance for us at
the moment. Anyway, I could imagine a periodic task, that sifts through all
docs and removes/places a "latest" flag.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.