Frequently updated int field


(Andy-2) #1

I have an integer field "popularity" that is frequently updated. The
"popularity" of a document can either be increased or decreased. I use
the value of that field to help rank my search results.

A search engine might not be designed for frequently updated fields
like that. Any tips on how best to handle that in ElasticSearch?

Thanks.


(Mahendra M) #2

Hi Andy,

On Tue, Jun 7, 2011 at 8:42 AM, Andy selforganized@gmail.com wrote:

I have an integer field "popularity" that is frequently updated. The
"popularity" of a document can either be increased or decreased. I use
the value of that field to help rank my search results.

Even I have the same use case. A "popularity" field being updated
based on usage of a document.

A search engine might not be designed for frequently updated fields
like that. Any tips on how best to handle that in ElasticSearch?

How frequent are your updates ? ElasticSearch, I think, can handle
frequent updates pretty well.
Do have a look at this link -
http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-elasticsearch/

Even so, we use some other tricks to reduce the frequency of updates.
Instead of updating Elasticsearch frequently, we collect the usage of
a document over a period of time (say 10 minutes) aggregate the result
and then update it to ElasticSearch. Maybe you can look at a similar
approach.

Regards,
Mahendra

http://twitter.com/mahendra


(Andy-2) #3

Thanks Mahendra.

How do you implement your batched updates? Do you fire up a cron job
every x minutes to get the "popularity" values from a database and
then use it to update ElasticSearch?

On Jun 7, 12:42 am, Mahendra M mahendr...@gmail.com wrote:

Hi Andy,

On Tue, Jun 7, 2011 at 8:42 AM, Andy selforgani...@gmail.com wrote:

I have an integer field "popularity" that is frequently updated. The
"popularity" of a document can either be increased or decreased. I use
the value of that field to help rank my search results.

Even I have the same use case. A "popularity" field being updated
based on usage of a document.

A search engine might not be designed for frequently updated fields
like that. Any tips on how best to handle that in ElasticSearch?

How frequent are your updates ? ElasticSearch, I think, can handle
frequent updates pretty well.
Do have a look at this link -http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-ela...

Even so, we use some other tricks to reduce the frequency of updates.
Instead of updating Elasticsearch frequently, we collect the usage of
a document over a period of time (say 10 minutes) aggregate the result
and then update it to ElasticSearch. Maybe you can look at a similar
approach.

Regards,
Mahendra

http://twitter.com/mahendra


(ppearcy) #4

Hey Andy/Mahendra,
We do the exact same things that Mahendra mentions. We have a custom
data processing tool we use instead of cron, but it runs in a very
similar fashion. It works off of relative values, though, where we get
the number of document requests since the last run and only update
documents that have changed. On top of doing these updates in batch at
certain intervals, we are also considering ignoring documents with
only a request or two.

We had hoped that parent/child documents would allow us to do this
more efficiently, but parent documents cannot be sorted by values in
child documents.

We haven't yet launched anything using this, but don't expect any
issues.

I had seen some interesting discussions around this at the Lucene
level, but don't believe any of it pertains to ES:
http://www.lucenerevolution.org/blog/2011/05/31/224/
http://www.mjohnston.com/2009/09/adding-external-datasources-to-lucene-scoring/
(A little older, so not sure if still relvant)

Thanks,
Paul

On Jun 7, 11:56 am, Andy selforgani...@gmail.com wrote:

Thanks Mahendra.

How do you implement your batched updates? Do you fire up a cron job
every x minutes to get the "popularity" values from a database and
then use it to update ElasticSearch?

On Jun 7, 12:42 am, Mahendra M mahendr...@gmail.com wrote:

Hi Andy,

On Tue, Jun 7, 2011 at 8:42 AM, Andy selforgani...@gmail.com wrote:

I have an integer field "popularity" that is frequently updated. The
"popularity" of a document can either be increased or decreased. I use
the value of that field to help rank my search results.

Even I have the same use case. A "popularity" field being updated
based on usage of a document.

A search engine might not be designed for frequently updated fields
like that. Any tips on how best to handle that in ElasticSearch?

How frequent are your updates ? ElasticSearch, I think, can handle
frequent updates pretty well.
Do have a look at this link -http://engineering.socialcast.com/2011/05/realtime-search-solr-vs-ela...

Even so, we use some other tricks to reduce the frequency of updates.
Instead of updating Elasticsearch frequently, we collect the usage of
a document over a period of time (say 10 minutes) aggregate the result
and then update it to ElasticSearch. Maybe you can look at a similar
approach.

Regards,
Mahendra

http://twitter.com/mahendra


(fashionalwallet) #5
  • deleted -

(system) #6