ES is rounding keyword values

Got an issue with field keyword.

Mapping for it looks like:

          "user_id": {
            "type": "keyword"
          }

value for user_id is:
9221662934816997716

but ES is returning it in kibana our even in curl like:
9221662934816997000

if I change type to long I got in kibana:
9,221,662,934,816,997,716
any one knows why ES is rounding this ID ?

Piotr

Can you please provide a full, reproducible, but minimal example, including mapping, indexing of documents? That would help a lot!

--Alex

Query payload:

{"query":{"ids":{"type":"log","values":["XgxhCWdmZVtRAAAA"]}},"stored_fields":["*"],"_source":true,"script_fields":{},"docvalue_fields":["data.ctime","data.master.ctime","data.master.time","data.time","data.timeouttime","insertion_time","time"]}

Mapping:
mapping

Response:

{
  "took": 151,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "prod-app-logs-2018-08-04",
        "_type": "log",
        "_id": "XgxhCWdmZVtRAAAA",
        "_score": 1,
        "_source": {
          "component": "app-prod-app",
          "time": "2018-08-04T08:40:07.872245807Z",
          "data": {
            "session_id": 0,
            "api_version": 2,
            "ip": "X.X.X.X",
            "handler_id": "91ec63ab60f072e6",
            "user_group_id": 9197237133460630000,
            "user_id": 9221662934816997000
          },
          "domain": [

          ],
          "insertion_order": 81,
          "insertion_time": "2018-08-04T08:40:07.938462727Z",
          "message": "API call",
          "level": "info"
        },
        "fields": {
          "insertion_time": [
            "2018-08-04T08:40:07.938Z"
          ],
          "time": [
            "2018-08-04T08:40:07.872Z"
          ]
        }
      }
    ]
  }
}

We are indexing directly from Haskell application, what we have in raw logs for this is:

2018-08-04 08:40:07 INFO app-prod-app: API call {
    "session_id": 0,
    "api_version": 2,
    "ip": "X.X.X.X",
    "handler_id": "91ec63ab60f072e6",
    "user_group_id": 9197237133460629465,
    "user_id": 9221662934816997716
}

it is not possible to reproduce this example, as you have not provided the document you indexed (I could of course do assumptions what you indexed, but that is making it very hard, as I might be plain wrong with that :slight_smile:

Also, please format the code snippets (as you can use markdown), as it makes it infinitely easier to read sth like a mapping snippet.

I've moved mapping into gist. Now format is much better. Added raw logs of what was indexed.
We are indexing directly from Haskell lib so this is all I have.

can you use curl and run a GET against the document and check if the source and the id is properly returned. I just noticed that the output in the developer console in kibana is broken, but works in curl

PUT foo/doc/1
{
    "id": 9197237133460629465
}

GET foo/doc/1

this returns a seemingly modified _source

{
  "_index": "foo",
  "_type": "doc",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "id": 9197237133460630000
  }
}

however running curl in a terminal returns this

# curl localhost:9200/foo/doc/1
{"_index":"foo","_type":"doc","_id":"1","_version":1,"found":true,"_source":{
    "id": 9197237133460629465
}

Is it possible that your application is already wrongly indexing data because the number is maybe not considered a safe integer? Might make sense to just wrap it into double ticks...

one more addition here: Elasticsearch never modifies the _source unless you are using pipelines.

With curl I've got:
curl localhost:9200/prod-app-logs-2018-08-04/log/XgxhCWdmZVtRAAAA

{"_index":"prod-app-logs-2018-08-04","_type":"log","_id":"XgxhCWdmZVtRAAAA","_version":1,"found":true,"_source":{"component":"app-prod-app","time":"2018-08-04T08:40:07.872245807Z","data":{"session_id":0,"api_version":2,"ip":"X.X.X.X","handler_id":"91ec63ab60f072e6","user_group_id":9197237133460629465,"user_id":9221662934816997716},"domain":[],"insertion_order":81,"insertion_time":"2018-08-04T08:40:07.938462727Z","message":"API call","level":"info"}}

So numbers are correct there, but when I've tried the same in Cerebro tool I got wrong output:

{
  "_index": "prod-app-logs-2018-08-04",
  "_type": "log",
  "_id": "XgxhCWdmZVtRAAAA",
  "_version": 1,
  "found": true,
  "_source": {
    "component": "app-prod-app",
    "time": "2018-08-04T08:40:07.872245807Z",
    "data": {
      "session_id": 0,
      "api_version": 2,
      "ip": "X.X.X.X",
      "handler_id": "91ec63ab60f072e6",
      "user_group_id": 9197237133460630000,
      "user_id": 9221662934816997000
    },
    "domain": [

    ],
    "insertion_order": 81,
    "insertion_time": "2018-08-04T08:40:07.938462727Z",
    "message": "API call",
    "level": "info"
  }
}

it's a javascript issue (which cerebro is likely using through your browser just like kibana)...

# node -e 'console.log(9197237133460629465)'
9197237133460630000

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.