Best practice for handling _ids in get and search results

Hello,

Frontend engineer here, slowly improving at using Elasticsearch.

Does anyone have any advice on how to best work with document _ids, specifically with regards to sending them to the front end after a _search or _get along with the _source data?

The application I've inherited has had various approaches applied; stored copies of the _id as _source.id, mixing in _id as part of the returned results, etc, etc.

Note that we have our own Express back end which calls Elasticsearch, then sends flattened results to the front end like so:

{ id: hit._id, ...hit._source }

Are there any best practices, specifically with regards to a fairly general CRUD app?

Clearly we need the _id in order to make updates, and we would like to find a balance between sending back clean data we can pass around the app, and not writing code which would make a more experienced Elastic practitioner wince.

It is clearly not as straightforward as it would have been with SQL.

Thanks,
Dave

I am no expert in this coding at all. but just use python to update few million document and added a field to it. used _id to update.

  1. used elasticsearch_dsl to retrieve only part of data from Elasticsearch.
  2. calculated new field.
  3. created update document and use Elasticsearch bulk update to update document on same index where I am reading from.

for example

from Elasticsearch import Elasticsearch, helpers
from elasticsearch_dsl import Search

elastic_output = Elasticsearch([hostnames], http_auth=('elastic', 'elastic'), port=9200
client = Elasticsearch([hostnames], http_auth=('elastic', 'elastic'), port=9200)

test using one job, once done please remove .query*

s = Search(using=client, index=input_index).query("match", job="4139835950")
search = s.source(["job", "minutes", "nodes", "exechost"])

this will return four value. for that job#. which is also my _id

Then I created dictionary record like this
mylist = [{"_index": "your_index", "_id": "4139835950", "_op_type": "update", "_source": {"my_new_field": "9.98654"}}]

and created list of such entry. once list is 5000 count I used bulk helper to write back to index

helpers.bulk(elastic_output, mylist)

Thanks for the reply @elasticforme.

I'm good with the search and update APIs, the thrust of my question was really how folks structure the data returned from an Elastic _search or _get to be consumed by the front end application.

Do they:

1 ) Pass back the whole hit and reference _id and _src separately in the frontend code, i.e. model._id and model._source.name:

{
  "_index" : "contacts",
  "_type" : "_doc",
  "_id" : "VBj7bH0Bk8QNffIJSaXC",
  "_score" : 1.0,
  "_source" : {
    "name" : "Some contact",
    "phone" : "123456789"
  }
}

2 ) Merge the _id into _src and return that, , referencing it all as one object i.e. model._id and model.name:

{
  "_id" : "VBj7bH0Bk8QNffIJSaXC",
  "name" : "New contact",
  "phone" : "123456789"
}

3 ) Something else, for example passing _id as id ?

{
  "id" : "VBj7bH0Bk8QNffIJSaXC",
  "name" : "New contact",
  "phone" : "123456789"
}

As mentioned, I just don't want to start reinventing the wheel or introducing bad practices.

Admittedly, this kind of thing is always a balance between back and front end constraints and motivations.

This is correct method for updating document.

1 Like

I have three index with 100 to 150 Million record in each these are 2018, 2019 and 2020 data and now I needed to add a field in to each record.

just finish 2018 index. took me 7 hour for extraction and update of all document in a index.

same way that you have explain in your first example.

I'm talking about the data passed from Elastic to the front end application (perhaps in a _search reequest) not the data passed to Elastic via an _update request. I shall update the post to disambiguate.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.