_update_by_query issue

Hi all,
I am new to elasticsearh and here is my question.
I appreciate your help :slight_smile:

I have an index with the name my_index. It has a field called as the Reason. I need to map each reason to a particular category. For eg:

Reason - A,C,E
Category - car

Reason- B,D,F
Category - bike

This is the way I have done it:

POST /my_index/_update_by_query
{
"script": {
    "source": """
      if (ctx._source.reason == 'A') { 
        ctx._source.Category = 'car'; 
      } else if (ctx._source.reason == 'transfer') { 
        ctx._source.Category = 'bike'; 
      } else if (ctx._source.reason == 'C') { 
        ctx._source.Category = 'car'; 
}
.
.
else {
        ctx._source.Category = 'Unknown' ;
      }
    """,
    "lang": "painless"
  },
"query": {
  "terms": {
    "reason": ["A","B","C","D","E","F"]
 }
  },
   "size": 1000
}

It created a new field and all the Categories are assigned based on the reason.

But the problem is, after sometime, some of the the values inside the categories show '-', even with 'reason', and they eventually show more and more records with '-'. And if I run the query again, theres no more '-'. and the cycle repeats.

what could be the issue and how to address this? Thanks for the help!

Hi Megha, Are you looking to generate Category field on runtime while performing search or you want to create on data ingestion?

It sounds to me like you are indexing new data, for which the category is not set, or updating documents by overwriting with documents that do not have the categpry set. When you run update by query you only update the documents currently in the index and future inserts and updates are not affected. If you want the category to always be populated you could run your script in an ingext pipeline and have it apply to all cahnages, which could remove the need to run update-by-query.

I believe data ingestion would be ideal

Great! Thank you!
I tried the way you suggested, I get run time errors.

PUT _ingest/pipeline/set_category_pipeline
{
"processors": [
{
"script": {
    "source": """
      if (ctx._source.reason == 'A') { 
        ctx._source.Category = 'car'; 
      } else if (ctx._source.reason == 'transfer') { 
        ctx._source.Category = 'bike'; 
      } else if (ctx._source.reason == 'C') { 
        ctx._source.Category = 'car'; 
}
.
.
else {
        ctx._source.Category = 'Unknown' ;
      }
    """
}
}
POST my_index/_doc/?pipeline=set_category_pipeline

{

"reason": "A"

}

{

"reason": "B"

}

{

"reason": "C"

}

Any help is appreciated. Thank you!

But can I generate the Category field on run time ? Does it update the field dynamically?