Facet that returns a value from the most recent matching document?


(Eric Jain) #1

I'd like to add a facet to an existing query that gives me the value of a
specific field in the most recent (as determined by a specified timestamp
field) document.

e.g. given these 3 documents:

timestamp:20120925T01:00 tag:foo score:500
timestamp:20120925T01:00 tag:bar score:200
timestamp:20120925T02:00 tag:bar score:300

...and the query tag:bar, I want to have the latest value of the score
field (i.e. 300) included in the result (regardless of how I'm sorting and
paging).

Is this possible, or would this require a custom facet? Speaking of which:
What's the status of the facet API?

--


(BillyEm) #2

Recency contribution to scoring is pretty common in News sites and others;
.

That said, there are a number of ways in Lucene to boost on the value of a
field in a query. Field boosts aren't going to be useful given your query.
2 others, might be a function-query that reserves a score value, or the
docid of the most recent, as it aggregates the result set. Function
queries, in my experience, have widely varying performance characteristics,
and are not necessarily easy to adopt. That leaves payloads, as another
poster suggested.

I've been part of a team to implement recency weighting directly into a
large scale search engine. We used a Pade algorithm for what its worth,
since defining decay of weighting was clear. Getting it right was not so
clear.

Geez, I'm long winded these days. Sorry folks. Anyway Eric -- the 2
approaches that I would be considering require either, an indexing system
that does ETL on the documents such that you can readily get the hit you
want, or a custom scorer.

2 resources that might illuminate more current thinking would be: a)
javasoze/bobo (look at the breadth of facets javadoc for ideas about what
might fit functionally, even if you're not up to using bobo, and b) lucene
4.0.x. The latter because the absence of a feature in 3.x does not mean the
Lucene team hasn't already identified a way to implement it .... albeit for
future releases. (how do you spell "backport")

g'luck

On Wednesday, September 26, 2012 4:25:47 AM UTC-4, Eric Jain wrote:

I'd like to add a facet to an existing query that gives me the value of a
specific field in the most recent (as determined by a specified timestamp
field) document.

e.g. given these 3 documents:

timestamp:20120925T01:00 tag:foo score:500
timestamp:20120925T01:00 tag:bar score:200
timestamp:20120925T02:00 tag:bar score:300

...and the query tag:bar, I want to have the latest value of the score
field (i.e. 300) included in the result (regardless of how I'm sorting and
paging).

Is this possible, or would this require a custom facet? Speaking of which:
What's the status of the facet API?

--


(Eric Jain) #3

It should still be possible to accomplish this with a custom facet in
Elasticsearch, right? Doesn't seem more involved than the "date
histogram with value field" facet...

On Fri, Sep 28, 2012 at 5:29 PM, BillyEm wmartinusa@gmail.com wrote:

Recency contribution to scoring is pretty common in News sites and others;
.

That said, there are a number of ways in Lucene to boost on the value of a
field in a query. Field boosts aren't going to be useful given your query. 2
others, might be a function-query that reserves a score value, or the docid
of the most recent, as it aggregates the result set. Function queries, in my
experience, have widely varying performance characteristics, and are not
necessarily easy to adopt. That leaves payloads, as another poster
suggested.

I've been part of a team to implement recency weighting directly into a
large scale search engine. We used a Pade algorithm for what its worth,
since defining decay of weighting was clear. Getting it right was not so
clear.

Geez, I'm long winded these days. Sorry folks. Anyway Eric -- the 2
approaches that I would be considering require either, an indexing system
that does ETL on the documents such that you can readily get the hit you
want, or a custom scorer.

2 resources that might illuminate more current thinking would be: a)
javasoze/bobo (look at the breadth of facets javadoc for ideas about what
might fit functionally, even if you're not up to using bobo, and b) lucene
4.0.x. The latter because the absence of a feature in 3.x does not mean the
Lucene team hasn't already identified a way to implement it .... albeit for
future releases. (how do you spell "backport")

g'luck

On Wednesday, September 26, 2012 4:25:47 AM UTC-4, Eric Jain wrote:

I'd like to add a facet to an existing query that gives me the value of a
specific field in the most recent (as determined by a specified timestamp
field) document.

e.g. given these 3 documents:

timestamp:20120925T01:00 tag:foo score:500
timestamp:20120925T01:00 tag:bar score:200
timestamp:20120925T02:00 tag:bar score:300

...and the query tag:bar, I want to have the latest value of the score
field (i.e. 300) included in the result (regardless of how I'm sorting and
paging).

Is this possible, or would this require a custom facet? Speaking of which:
What's the status of the facet API?

--

--


(dobe) #4

see the "Latest Facet"
on https://github.com/lovelysystems/elasticsearch-ls-plugins it does
exactly this.

On Wednesday, September 26, 2012 10:25:47 AM UTC+2, Eric Jain wrote:

I'd like to add a facet to an existing query that gives me the value of a
specific field in the most recent (as determined by a specified timestamp
field) document.

e.g. given these 3 documents:

timestamp:20120925T01:00 tag:foo score:500
timestamp:20120925T01:00 tag:bar score:200
timestamp:20120925T02:00 tag:bar score:300

...and the query tag:bar, I want to have the latest value of the score
field (i.e. 300) included in the result (regardless of how I'm sorting and
paging).

Is this possible, or would this require a custom facet? Speaking of which:
What's the status of the facet API?

--


(Eric Jain) #5

On Sat, Sep 29, 2012 at 4:09 AM, dobe bernddorn@gmail.com wrote:

see the "Latest Facet" on
https://github.com/lovelysystems/elasticsearch-ls-plugins it does exactly
this.

I'll need to support float values (and don't need to aggregate on a
key field), but looking through that code should help me a
lot--thanks!

Does anyone know if there are significant changes to the facet API
planned for 0.20.0?

--


(Ivan Brusic) #6

Judging by the commits and issues, there was no work done on facets.

On Thu, Oct 4, 2012 at 12:28 AM, Eric Jain eric.jain@gmail.com wrote:

Does anyone know if there are significant changes to the facet API
planned for 0.20.0?

--


(system) #7