i am using bulk to update 5m doc to an index. I think i could use update_by_query to speed up it. in my case: I have an array of id filed value ( it is unique), and I'm going to use bool query to find all document match and update their's field X by script.
for example, I have bulk query like this.
{ "update" : { "_id" : "42348:1404408", "_index" : "v3_customers_42348", "retry_on_conflict" : 3} }
{ "script" : { "id": "cdp_upsert", "params" : {"data" : {"customer_id":"1404408"},"segment":{"include":["524837"]}}}}
{ "update" : { "_id" : "42348:1404413", "_index" : "v3_customers_42348", "retry_on_conflict" : 3} }
{ "script" : { "id": "cdp_upsert", "params" : {"data" : {"customer_id":"1404413"},"segment":{"include":["524837"]}}}}
{ "update" : { "_id" : "42348:1404414", "_index" : "v3_customers_42348", "retry_on_conflict" : 3} }
{ "script" : { "id": "cdp_upsert", "params" : {"data" : {"customer_id":"1404414"},"segment":{"include":["524837"]}}}}
{ "update" : { "_id" : "42348:1404415", "_index" : "v3_customers_42348", "retry_on_conflict" : 3} }
{ "script" : { "id": "cdp_upsert", "params" : {"data" : {"customer_id":"1404415"},"segment":{"include":["524837"]}}}}
what I am going to do is
POST v3_customers_33167/_update_by_query?refresh=true
{
"query": {
"bool": {
"should": [
{"match": {"customer_id":"558566653"}},
// ..........................about 5k match.......................
{"match": {"customer_id":"562488687"}}
]
}
},
"script":{
"source":"ctx._source.segment_ids = ctx._source.segment_ids +'|'+params['segment_id'] ",
"params":{
"segment_id":"2233"
}
}
}
will this help ES not high load? cause if I use bulk, ES will respond slow and high load but bulk size about 5000. and is it possible to send an request have 5000 match in query ?