I am asking for another opinion... BUT
My conclusion is that elasticsearch is in fact handling the data properly and this issue is related to the JSON / Javascript parsing issue of some sort like what @Christian_Dahlqvist mentioned earlier.
In short i would recommend wrapping your hashes in double quotes or make sure there is no JSON / Javascript parsing going on when you ingest your data or on the display side.
At the bottom, I show it can work without wrapping the hashes in double quotes but I would recommend it anyways so that the _source
version of the hashes looks correct in browser-based apps.
Here are a couple of experiments to show you, see if it makes sense...
Basically, the data is not shown correctly in the Browser, Dev Tools, Discover etc. and if you are ingesting the longs that are too long for the JSON / Javascript they are getting rounded/truncated.
All The above code was done through the Kibana - Dev Tool
But now let's try something else
So now in Kibana Dev Tools
I created a mapping with a runtime field doing the modulus
DELETE discuss-test
PUT /discuss-test
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"hash": {
"type": "long"
}
},
"runtime": {
"modulus": {
"type": "long",
"script": {
"source": "long modulus = (doc['hash'].value) % 21; emit(modulus);"
}
}
}
}
}
Now I added in the Kibana Dev Tools with the hashes double quoted and then run the query.
Note The modulus is correct but the field value still looks wrong as it is getting processed by the JSON / JScript in the browser... Kibvana Dev-Tool
POST discuss-test/_doc
{
"name" : "test1",
"hash" : "9048716794795431"
}
POST discuss-test/_doc
{
"name" : "test2",
"hash" : "9475189089572885"
}
POST discuss-test/_doc
{
"name" : "test3",
"hash" : "9584341227278131"
}
GET discuss-test/_search
{
"fields": [ "*"]
}
# Result Notice the Correct Moduls BUT the field still looks wrong because the JSON parser is truncating
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "discuss-test",
"_id": "fO36bIYBNTELl2kaPLS-",
"_score": 1,
"_source": {
"name": "test1",
"hash": "9048716794795431"
},
"fields": {
"name": [
"test1"
],
"modulus": [
12 <!---- CORRECT
],
"hash": [
9048716794795432 <!- Looks wrong but this is the JSON / Javascript Parser.
]
}
},
{
"_index": "discuss-test",
"_id": "fe36bIYBNTELl2kaPLTS",
"_score": 1,
"_source": {
"name": "test2",
"hash": "9475189089572885"
},
"fields": {
"name": [
"test2"
],
"modulus": [
14
],
"hash": [
9475189089572884
]
}
},
{
"_index": "discuss-test",
"_id": "fu36bIYBNTELl2kaPLTi",
"_score": 1,
"_source": {
"name": "test3",
"hash": "9584341227278131"
},
"fields": {
"name": [
"test3"
],
"modulus": [
13
],
"hash": [
9584341227278132
]
}
}
]
}
}
But if I run the exact same query directly from my command line all so no JSON / JavaScript Parsing At ALL the data ALL looks correct.
$ curl -H "Content-Type: application/json" localhost:9200/discuss-test/_search?pretty -d '{"fields" : ["*"]}'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "discuss-test",
"_id" : "fO36bIYBNTELl2kaPLS-",
"_score" : 1.0,
"_source" : {
"name" : "test1",
"hash" : "9048716794795431"
},
"fields" : {
"name" : [
"test1"
],
"modulus" : [
12 <!---- CORRECT
],
"hash" : [
9048716794795431 <!---- CORRECT
]
}
},
{
"_index" : "discuss-test",
"_id" : "fe36bIYBNTELl2kaPLTS",
"_score" : 1.0,
"_source" : {
"name" : "test2",
"hash" : "9475189089572885"
},
"fields" : {
"name" : [
"test2"
],
"modulus" : [
14
],
"hash" : [
9475189089572885
]
}
},
{
"_index" : "discuss-test",
"_id" : "fu36bIYBNTELl2kaPLTi",
"_score" : 1.0,
"_source" : {
"name" : "test3",
"hash" : "9584341227278131"
},
"fields" : {
"name" : [
"test3"
],
"modulus" : [
13
],
"hash" : [
9584341227278131
]
}
}
]
}
}
What makes this even MORE confusing (but also confirms the elasticsearch is IN FACT handling the data correctly) if I POST the documents directly from the command line with NO double quotes that Also works... this is consistent as well as there is no JSON parsing just a direct post of data.
I cleaned up the index and re POSTED the mapping above.
see All This
Post a few docs without the hash double quotes ...
hyperion:~ sbrown$ curl -X POST -H "Content-Type: application/json" http://localhost:9200/discuss-test/_doc -d '{ "name" : "test1", "hash" : 9048716794795431 }'
hyperion:~ sbrown$ curl -X POST -H "Content-Type: application/json" http://localhost:9200/discuss-test/_doc -d '{ "name" : "test2", "hash" : 9475189089572885 }'
hyperion:~ sbrown$ curl -X POST -H "Content-Type: application/json" http://localhost:9200/discuss-test/_doc -d '{ "name" : "test3", "hash" : 9584341227278131 }'
Now run the search from the command line and everything looks as expected
hyperion:~ sbrown$ curl -H "Content-Type: application/json" localhost:9200/discuss-test/_search?pretty -d '{"fields" : ["*"]}'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "discuss-test",
"_id" : "f-0IbYYBNTELl2kaErS2",
"_score" : 1.0,
"_source" : {
"name" : "test1",
"hash" : 9048716794795431
},
"fields" : {
"name" : [
"test1"
],
"modulus" : [
12
],
"hash" : [
9048716794795431
]
}
},
{
"_index" : "discuss-test",
"_id" : "gO0IbYYBNTELl2kaMLT3",
"_score" : 1.0,
"_source" : {
"name" : "test2",
"hash" : 9475189089572885
},
"fields" : {
"name" : [
"test2"
],
"modulus" : [
14
],
"hash" : [
9475189089572885
]
}
},
{
"_index" : "discuss-test",
"_id" : "ge0IbYYBNTELl2kaS7Qj",
"_score" : 1.0,
"_source" : {
"name" : "test3",
"hash" : 9584341227278131
},
"fields" : {
"name" : [
"test3"
],
"modulus" : [
13
],
"hash" : [
9584341227278131
]
}
}
]
}
}
hyperion:~ sbrown$
So this is all correct
But if I go into Dev Tools it still looks wrong... even though it is correct
Wrap the hashes in double quotes that'll make the visual more correct.
And if you really wanted to, you couldn't use a multi-valued field with both a keyword and a long so that you could always see it correctly in the fields as well... If you want to do that let me know and I'll show you
I know this is long but I hope it make sense!