Best strategy for often updated documents

I have forum. And every topic has such field as viewCount - how many times topic was viewed by forum users.

I wanted that all fields of topics were taken from ES (id,date,title,content and viewCount). However, this case after every topic view ES must reindex entire document again - I asked the question about particial update at stack - http://stackoverflow.com/questions/28937946/partial-update-on-field-that-is-not-indexed .

It means that if topic is viewed 1000 times ES will index it 1000 times. And if I have a lot of users many documents will be indexed again and again. This is first strategy.

The second strategy, as I think is to take some fields of topic from index and some from database. At this case I take viewAcount from DB. However, then I can store all fields in DB and use index only as INDEX - to get ids of current topic.

Are there better strategies? What is the best way to solve such problem?

--
Александр Свиридов

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1426926256.618220498%40f392.i.mail.ru.
For more options, visit https://groups.google.com/d/optout.

One thing you might want to consider is whether or not you need your index
to stay perfectly in synch with your database. If a topic is viewed 1000
times over the course of 2 minutes, is it important that Elasticsearch
update after every one? Maybe after each update you queue a reindexing, but
you only reindex once a minute and ignore duplicates. In this example
instead of reindexing 1000 times you'd reindex twice, and your index would
still be near-realtime. But that depends on the business requirements.

If you absolutely need every update to be applied to your index ASAP you'd
have to use partial updates. If you don't need to search on viewCount at
all, then you shouldn't let an update to that field trigger a
reindex/update at all. We have a similar situation with a field that can be
incremented/decremented constantly and we ignore it in our ingestion
process. Search operates against the text fields and returns basic info -
item descriptions, etc. - and the URL to the canonical version of the
resource in our API. Clients that care about those dynamic fields can then
hit the API to get all fields including the up-to-the-millisecond count
fields.

On Saturday, March 21, 2015 at 4:24:25 AM UTC-4, ooo_saturn7 wrote:

I have forum. And every topic has such field as viewCount - how many times
topic was viewed by forum users.

I wanted that all fields of topics were taken from ES
(id,date,title,content and viewCount). However, this case after every topic
view ES must reindex entire document again - I asked the question about
particial update at stack -
http://stackoverflow.com/questions/28937946/partial-update-on-field-that-is-not-indexed
.

It means that if topic is viewed 1000 times ES will index it 1000 times.
And if I have a lot of users many documents will be indexed again and
again. This is first strategy.

The second strategy, as I think is to take some fields of topic from index
and some from database. At this case I take viewAcount from DB. However,
then I can store all fields in DB and use index only as INDEX - to get ids
of current topic.

Are there better strategies? What is the best way to solve such problem?

--
Александр Свиридов

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9ff06e0f-2bc3-43bc-91f2-d76f37e31309%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thank you for reply. I dont' need viewCount field to be indexed. I need this field only to be displayed, because data I take from ES. For example when user open forum section of the site he see topics (with information how many times each of them was viewed) from elastic. The problem is that one field is updated very often (even if I will reindex every minute,or 5 minutes).

There are two terms - particial update and particial reindex. As I understand even when you do particial update (only one field of document) ES reindex entire document.

As I understand you did this way - this often updated field you don't display by default. But I need to show it always.

Понедельник, 23 марта 2015, 6:23 -07:00 от Joel Potischman joel.potischman@beatport.com:

One thing you might want to consider is whether or not you need your index to stay perfectly in synch with your database. If a topic is viewed 1000 times over the course of 2 minutes, is it important that Elasticsearch update after every one? Maybe after each update you queue a reindexing, but you only reindex once a minute and ignore duplicates. In this example instead of reindexing 1000 times you'd reindex twice, and your index would still be near-realtime. But that depends on the business requirements.

If you absolutely need every update to be applied to your index ASAP you'd have to use partial updates. If you don't need to search on viewCount at all, then you shouldn't let an update to that field trigger a reindex/update at all. We have a similar situation with a field that can be incremented/decremented constantly and we ignore it in our ingestion process. Search operates against the text fields and returns basic info - item descriptions, etc. - and the URL to the canonical version of the resource in our API. Clients that care about those dynamic fields can then hit the API to get all fields including the up-to-the-millisecond count fields.

On Saturday, March 21, 2015 at 4:24:25 AM UTC-4, ooo_saturn7 wrote:

I have forum. And every topic has such field as viewCount - how many times topic was viewed by forum users.

I wanted that all fields of topics were taken from ES (id,date,title,content and viewCount). However, this case after every topic view ES must reindex entire document again - I asked the question about particial update at stack - http://stackoverflow.com/questions/28937946/partial-update-on-field-that-is-not-indexed .

It means that if topic is viewed 1000 times ES will index it 1000 times. And if I have a lot of users many documents will be indexed again and again. This is first strategy.

The second strategy, as I think is to take some fields of topic from index and some from database. At this case I take viewAcount from DB. However, then I can store all fields in DB and use index only as INDEX - to get ids of current topic.

Are there better strategies? What is the best way to solve such problem?

--
Александр Свиридов

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1427117659.226985795%40f229.i.mail.ru.
For more options, visit https://groups.google.com/d/optout.