I have an integer field "popularity" that is frequently updated. The
"popularity" of a document can either be increased or decreased. I use
the value of that field to help rank my search results.
A search engine might not be designed for frequently updated fields
like that. Any tips on how best to handle that in ElasticSearch?
I have an integer field "popularity" that is frequently updated. The
"popularity" of a document can either be increased or decreased. I use
the value of that field to help rank my search results.
Even I have the same use case. A "popularity" field being updated
based on usage of a document.
A search engine might not be designed for frequently updated fields
like that. Any tips on how best to handle that in Elasticsearch?
Even so, we use some other tricks to reduce the frequency of updates.
Instead of updating Elasticsearch frequently, we collect the usage of
a document over a period of time (say 10 minutes) aggregate the result
and then update it to Elasticsearch. Maybe you can look at a similar
approach.
How do you implement your batched updates? Do you fire up a cron job
every x minutes to get the "popularity" values from a database and
then use it to update Elasticsearch?
I have an integer field "popularity" that is frequently updated. The
"popularity" of a document can either be increased or decreased. I use
the value of that field to help rank my search results.
Even I have the same use case. A "popularity" field being updated
based on usage of a document.
A search engine might not be designed for frequently updated fields
like that. Any tips on how best to handle that in Elasticsearch?
Even so, we use some other tricks to reduce the frequency of updates.
Instead of updating Elasticsearch frequently, we collect the usage of
a document over a period of time (say 10 minutes) aggregate the result
and then update it to Elasticsearch. Maybe you can look at a similar
approach.
Hey Andy/Mahendra,
We do the exact same things that Mahendra mentions. We have a custom
data processing tool we use instead of cron, but it runs in a very
similar fashion. It works off of relative values, though, where we get
the number of document requests since the last run and only update
documents that have changed. On top of doing these updates in batch at
certain intervals, we are also considering ignoring documents with
only a request or two.
We had hoped that parent/child documents would allow us to do this
more efficiently, but parent documents cannot be sorted by values in
child documents.
We haven't yet launched anything using this, but don't expect any
issues.
How do you implement your batched updates? Do you fire up a cron job
every x minutes to get the "popularity" values from a database and
then use it to update Elasticsearch?
I have an integer field "popularity" that is frequently updated. The
"popularity" of a document can either be increased or decreased. I use
the value of that field to help rank my search results.
Even I have the same use case. A "popularity" field being updated
based on usage of a document.
A search engine might not be designed for frequently updated fields
like that. Any tips on how best to handle that in Elasticsearch?
Even so, we use some other tricks to reduce the frequency of updates.
Instead of updating Elasticsearch frequently, we collect the usage of
a document over a period of time (say 10 minutes) aggregate the result
and then update it to Elasticsearch. Maybe you can look at a similar
approach.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.