_id value is getting changed in elastic

I have a unique_id value in my index and that is also used as the document_id, but as my document count is increasing drastically I had changed the type from double because that was causing truncation of my values. I changed it to data type "keyword" as many elastic articles suggest that we should only use numeric datatype if we are going to do range searches, aggregation, etc.. which I am not performing on the unique_id field. But after that, I notice that it is changing the whole value, for example, I send the value "432375692312511746" and in elastic doc, it shows up as "795807590142069243".

I would like to understand why is this happening and also which data type to use for such long numbers and not get into the problem of truncation and change values.

waiting on the community for response.

Hi @gbandasha

Can you please provide the following

The version of elastic you're working with

The mapping for the document

A sample input document

Then what the document looks like after it has been indexed / when you GET it back.

And point out the differences or changes.

Sure, please find the details below

  1. v 7.11.2.

  2. I am using the legacy mapping template and the field that I am facing an issue with is unique_id and the mapping for that is attached below in the sinnpet.

  1. Sample JSON data.

{"unique_id" : "432375692312511746","client_name" : "test"}

  1. In Kibana Discover and even the JSON tab it should up as

{"unique_id" : "795807590142069243","client_name" : "test"}

Let me know if you need anything else

Hmmmm I just ran this on 7.11.1 and 7.12.0 (I didn't have a 7.11.2 handy)
Are you sure you are retrieving the same documents?

DELETE test

PUT /test
{
  "mappings": {
    "properties": {
      "name" : {"type" : "keyword"},
      "unique_id": {
        "type" : "keyword",
        "eager_global_ordinals": false,
        "norms": false,
        "index": true,
        "store": false,
        "index_options": "docs",
        "split_queries_on_whitespace" : false,
        "doc_values": true
      }
    }
  }
}



POST test/_doc
{
  "name" : "stephen",
  "unique_id" : "432375692312511746"
}

POST test/_doc
{
  "name" : "jeffery",
  "unique_id" : "1293847209184720193847"
}


POST test/_doc
{
  "name" : "dude",
  "unique_id" : "09870987356409586734059867"
}

Results they all look correct to me. and In Discover and Kibana

GET test/_search

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "scmTqXgBZuwJvuVN3t8C",
        "_score" : 1.0,
        "_source" : {
          "name" : "stephen",
          "unique_id" : "432375692312511746"
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "ssmTqXgBZuwJvuVN3t8m",
        "_score" : 1.0,
        "_source" : {
          "name" : "jeffery",
          "unique_id" : "1293847209184720193847"
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "s8mTqXgBZuwJvuVN3t9J",
        "_score" : 1.0,
        "_source" : {
          "name" : "dude",
          "unique_id" : "09870987356409586734059867"
        }
      }
    ]
  }
}

Thanks, @stephenb for the quick response and you are right if I run through DEV tools to add a record or even If I try like 10 - 20 records through logstash it works but when I have like millions of doc's it changes the value.

Elasticsearch shouldn't be changing values for no reason, that is not to say there might be some strange defect but lets check a few other things.

Perhaps it could be something in your logstash pipeline when it's getting overwhelmed or wrong type setting there.

I am curious about a few of the non default settings, if you just make it a keyword without any of the other settings do you see the same behavior? i.e. just

DELETE test

PUT /test
{
  "mappings": {
    "properties": {
      "name" : {"type" : "keyword"},
      "unique_id": { "type" : "keyword"}
    }
  }
}

Also silly question how do you know you're lining up the input document with what you're seeing in elastic. Is there some other unique ID? in other words how do you know you're comparing to the same documents from source to elasticsearch when you have millions?

@stephenb I will try the default settings and let you know.

Regarding the second question, I am dealing with transactional data and I have many other elements like account_number, transaction date time, and the amount that helps me drill down to that specific record to compare.

1 Like

Ok good to know when you inspect the document in kafka is the unique_id still correct?

Is there any processing between kafka an Elasticsearch?

In Kafka there is no processing and it is correct I checked the topic data and compared it.

What it between Kafka and Elasticsearch and did you try the default mapping?

There is something going on... but if elasticsearch has randomly changing keyword values I think we would be getting a lot of reports on that. Lets keep looking.

@stephenb I set up the default mapping and resent the data through logstash and now the IDs are not getting changed.

Thanks a lot for your help.

1 Like

Good to know... interesting..., thanks for letting us know it is working.