Hand-rolled document versioning and query?


(Martin Czygan) #1

Hello,

We store bibliographic metadata in ES and it's working fine. However, due
to our update process, we store multiple documents per book - basically we
add a new document with a custom timestamp every time something gets
updated (outside ES).

Is there a way to perform queries, but restrict the searched set of
documents only to the latest version?

To illustrate it a bit more: E.g. we have a field that represent the ISBN -
which is semantically unique. Now there are like e.g. three documents with
this ISBN but with different timestamps (representing some change in the
associated metadata). What we would like to do is to restrict the actual
search (or filter or any operation you would normally perform) to the
document with latest timestamp.

I looked into scripting and scripting filters, but it seems, that the
scripting filter operates on a single document, so that I can't say
something like: for each match, throw out all documents but the one with
the maximum timestamp.

Thanks,
Martin


(Clinton Gormley) #2

Hiya

Is there a way to perform queries, but restrict the searched set of
documents only to the latest version?

Not that I can think of.

I looked into scripting and scripting filters, but it seems, that the
scripting filter operates on a single document, so that I can't say
something like: for each match, throw out all documents but the one
with the maximum timestamp.

Correct - queries and filters operate on single docs.

I would suggest adding a 'latest' flag which you then manually turn off
on the no-longer-latest version

clint


(Martin Czygan) #3

Hi,

Thanks. I already saw your (?) answer on Stackoverflow
(http://stackoverflow.com/questions/8218309/can-we-retrieve-previous-source-docs-with-elastic-search-versions/8226684#8226684)
which is related.

On Sunday, 10 June 2012 15:09:00 UTC+2, Clinton Gormley wrote:

Hiya

Is there a way to perform queries, but restrict the searched set of
documents only to the latest version?

Not that I can think of.

I looked into scripting and scripting filters, but it seems, that the
scripting filter operates on a single document, so that I can't say
something like: for each match, throw out all documents but the one
with the maximum timestamp.

Correct - queries and filters operate on single docs.

I would suggest adding a 'latest' flag which you then manually turn off
on the no-longer-latest version

Ok, I see. The trouble with this would be, that we would have to check for
a previous version every time we add a document - (conceptually this is not
unreasonable) - just to a bit too heavy in terms of performance for us at
the moment. Anyway, I could imagine a periodic task, that sifts through all
docs and removes/places a "latest" flag.

Thanks again for your quick answer!

Martin

clint


(Shay Banon) #4

Yea, a "latest" flag is the way to go...

On Sun, Jun 10, 2012 at 3:21 PM, Martin Czygan <martin.czygan@googlemail.com

wrote:

Hi,

Thanks. I already saw your (?) answer on Stackoverflow (
http://stackoverflow.com/questions/8218309/can-we-retrieve-previous-source-docs-with-elastic-search-versions/8226684#8226684)
which is related.

On Sunday, 10 June 2012 15:09:00 UTC+2, Clinton Gormley wrote:

Hiya

Is there a way to perform queries, but restrict the searched set of
documents only to the latest version?

Not that I can think of.

I looked into scripting and scripting filters, but it seems, that the
scripting filter operates on a single document, so that I can't say
something like: for each match, throw out all documents but the one
with the maximum timestamp.

Correct - queries and filters operate on single docs.

I would suggest adding a 'latest' flag which you then manually turn off
on the no-longer-latest version

Ok, I see. The trouble with this would be, that we would have to check for
a previous version every time we add a document - (conceptually this is not
unreasonable) - just to a bit too heavy in terms of performance for us at
the moment. Anyway, I could imagine a periodic task, that sifts through all
docs and removes/places a "latest" flag.

Thanks again for your quick answer!

Martin

clint


(system) #5