Hi!
I am trying to create materialized view of relational data in Elasticsearch. Lets say i have child
table and multiple parent
tables (all backed by separated microservices with event queue). I need to find relevant childs by parent properties mostly.
So i have my model denormalized like:
PUT matview/foo/1
{
"foo": "bar",
"parent1": { "id": "1", ... },
"parent2": { "id": "2", ... }
}
When parent1
update event accepted i perform _update_by_query
to update parent1 objects by id
POST matview/_update_by_query
{
"query": { "term": { "parent1.id": "1" } },
"script": { "source": "ctx._source.parent1.name='foo'" }
}
Problem is I am constantly getting version conflict
errors on update_by_query
and delete_by_query
. Preferable way to avoid those is using refresh=wait_for
and client-side retries as discussed here
github issue
and here
?refresh doc
So i think i should decrease refresh_interval
for this case to reduce probability of version conflicts and following retries
On the other hand my index is under havy indexing load and as described here
tune for indexing speed doc
i should increase refresh_interval
to increase indexing speed.
So i cant use refresh=wait_for
because it could wait too long for big refresh_interval
value
Question is: what to do increase or decrease? -_- or maybe there is some other ways to overcome this problem?
for example maybe some hack to disable versioning (i know can use external versioning in update by id but it seems no way to use external versioning in update_by_query
).
As i said before i use queues to populate elasticsearch so i have full control of index(shard) update order.