Bulk Update operation on multi field type and search

Previously we had mapping in index_1 Index.

{
    "properties": {
       "offer_id":{
        "type":long,
         "fields": {
                "keyword": {
                    "type": "text"
                }
            }
},
        "seller_account_id": {
            "type": "long"
          
        }
    }
}

and all data are there.
We were not able to search seller_account_id when provided in multi_match for full text search
So we decided to update mapping for seller_account_id as multi_type field.

{
    "properties": {
        "seller_account_id": {
            "type": "long",
            "fields": {
                "keyword": {
                    "type": "text"
                }
            }
        }
    }
}

I believe no Re-indexing is required for update mapping, but how we should make data available in search, because when we updated mapping search was not working?, we tried to use bulk API to update all documents. but still old documents are not searchable , only new documents are coming in search, I tried with _bulk API with _refresh=true, but still no luck

Do I need to create new Index and reindex all documents.( I dont prefer this)
Our multi_match request

{
    "query": {
         "multi_match": {
             "fields": [
                 "offer_id.keyword",
                 "seller_account_id.keyword"
             ],
             "operator": "and",
             "query": "9333 8029425",
             "type": "cross_fields"
         }
    }
}   

Is not working.

Have you tried using the update by query API to process old documents so the new mapping takes effect?

I was just checking examples from that document

POST test/_update_by_query?refresh&conflicts=proceed
POST test/_search?filter_path=hits.total
{
  "query": {
    "match": {
      "flag": "foo"
    }
  }
}

Are both different request or same?
Do I just need to use POST test/_update_by_query?refresh&conflicts=proceed without body?

Actually I did not understand example, If you can help me on it.

I would identify a few old documents that need to be updated and run an update by query with a filter to target just a few documents in order to verify that it works and resolves the issue. Once that is done you should be able to run the task without a body and process all documents. To be more selective you might also be able to write a query clause to select only documents that do not have the seller_account_id.keyword field defined.

How can I check if there is seller_account_id.keyword field is defined or not? can you share example.
Because if mapping is updated all document have seller_account_id available with seller_account_id.keyword right?

You can add a boolean query with a must_not exists clause.

Elasticsearch stores data in immutable segments. The new field will therefore not be added for existing documents unless you update them, causing them to be written to a new segment.

1 Like

So there will not be required _reindex API to use in overall solution? Generally what steps should be follow when updating field as multi_type field in existing index? and also existing data should be available for searching?

If we use update_by_query , how we can resolve conflicts in existing documents?

I believe that is correct.

Yes. Run a refresh before starting the update as the query clause relies on all old data being searchable.

If you have conflicts, I believe you should skip those documents as it indicates that a new version was indexed after you started the operation and that means it will automatically get the new mappings.

When I used

{
    "query": {
        "bool": {
            "must_not": {
                "exists": {
                    "field": "seller_account_id.keyword"
                }
            }
        }
    }
}

It returns only results where seller_account_id is null and there are few. but I have issue in many records.

Then you may need to update all documents.

I am thinking another solution

like adding temporary field like temp_id and bulk update all documents with this field , so document version would be updated and it will be available for search, right?

And after that I will remove field by updating mapping.

Let me know your thoughts

I would just run a update by query on all documents and skip on conflicts.

Adding a field requires an update to every document and you can not remove fields from mappings without reindexing. Sounds a lot more complicated...

1 Like

Ok, Thanks. running update_by_query for all documents mean there will be no critieria to search, I mean update_by_query would require some body right?

Is it correct to update all documents?

POST test/_update_by_query?refresh&conflicts=proceed
{
  "query": {
    "match_all": {}
  }
}

That should work but the first example in the docs I linked to seems to do exactly what you want.

1 Like

This one?

POST my-index-000001/_update_by_query?conflicts=proceed

Yes.

Curious if you tried run time field? this would be avoid updates to existing docs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.