Bulk update in ES


(Gosforth) #1

How can I update millions of docs in E?
I found this https://www.elastic.co/guide/en/elasticsearch/reference/current/_updating_documents.html but I cannot imagine this can be done with millions of rows. Even writing some script that will simulate this is a pain. Logstash can update documents? I want to add one field and update existing.
Regards


(Christian Dahlqvist) #2

Have you looked into using the update by query API?


(Florian Kelbert) #3

_udpate_by_query, as suggested by @Christian_Dahlqvist is certainly what you are looking for. It is quite powerful by allowing to update documents using Painless scripts. Bear in mind, however, that document updates will actually index new documents and delete the old ones. As such, depending on your use case, you might also want to consider to reindex into a new index, updating the documents (potentially using Painless scripts) while reindexing.


(Gosforth) #4

Thanks for your suggestion however I understand nothing from this documentation.

POST twitter/_update_by_query?conflicts=proceed

This is POST method or POST command in Kibana console (100 millions time to execute)?
Same about example:

POST twitter/_update_by_query?conflicts=proceed { "query": { ![](https://www.elastic.co/guide/en/elasticsearch/reference/6.4/images/icons/callouts/1.png) "term": { "user": "kimchy" } } }

POST is http method and Json is payload or 'POST' is console command?
If you mean console command; this is not API.

Suppose I want to update 'some_field' in index where this field has 'value_X' what I should do?
Something like in SQL UPDATEtable_nameSETcolumn_name=new_value' [WHERE condition];`

Regards


(Florian Kelbert) #5

Yes, the commands in the documentation are supposed to be used in the Kibana developer console. But these can always be translated into, e.g., curl commands.

Quoting Update By Query API:

Back to the API format, this will update [all] tweets from the twitter index:

POST twitter/_doc/_update_by_query?conflicts=proceed

The documentation also describes how to update documents for which a certain criterion matches:

You can also limit _update_by_query using the Query DSL. This will update all documents from the twitter index for the user kimchy :

POST twitter/_update_by_query?conflicts=proceed
{
  "query": { 
    "term": {
      "user": "kimchy"
    }
  }
}

(Gosforth) #6

Thank you.
What about REST WS; I'd like to use POST and send JSON as payload.


(Florian Kelbert) #7

Just send a corresponding REST request:

curl -X POST "localhost:9200/twitter/_update_by_query?conflicts=proceed" -H 'Content-Type: application/json' -d'
{
  "query": { 
    "term": {
      "user": "kimchy"
    }
  }
}
'

(Gosforth) #8

Thank you but what this query will do? Update ALL user fields to value 'kimchy'?
Hmmm... who needs such query. I'd like to include WHERE clause. I.e: update filed 'something' where filed 'user' is some value.

{
  "query": { 
    "term": {
      "user": "kimchy"
    }
  }
}

According the documentation this should work:

{"script": {"postal_code": "56789"},"query": { "match": {"client_name": "some company"}} }

But it does not.

Tried (not logical but works for some reason):
{"query": { "match": {"client_name": "some company"}},"script": "ctx._source.postal code = '56789'" }

Answer from ES is:
{"took":79,"timed_out":false,"total":1,"updated":1,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}

But this is not true. It updates nothing.

What to do to put new field in index with WHERE clause?
And what to do this filed is visible in index pattern?


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.