Long JSON number lost precision

Elasticsearch Version:
v5.0.2

Index Info:

GET /_cat/indices/dblog-2017.01.19?v

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open dblog-2017.01.19 Lis_GE3nTVutbUZMJYXDZw 10 1 1976053299 4 5.3tb 2.6tb

Problem Description:
When searching on a keyword field traceId with TermQuery, for example:
> POST /dblog-2017.01.19/_search
> {
> "query": {
> "term": {
> "traceId": {
> "value": "6226230315557965000"
> }
> }
> },
> "_source": "traceId"
> }

It returned no hits.

If using prefix query (& wildcard query) , the doc can be matched when the value queried is less than or equal to 15 characters like below:

POST /dblog-2017.01.19/_search
{
"query": {
"prefix": {
"traceId": {
"value": "622623031555796"
}
}
},
"_source": "traceId"
}

{
      "took": 4,
      "timed_out": false,
      "_shards": {
        "total": 10,
        "successful": 10,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 1,
        "hits": [
          {
            "_index": "dblog-2017.01.19",
            "_type": "logs",
            "_id": "AVm0Cv1DARfp1s5Erl20",
            "_score": 1,
            "_source": {
              "traceId": 6226230315557965000
            }
          }
        ]
      }
    }

If I index only a few docs to a test index, the same TermQuery matches the doc correctly.

I am not sure if this is a bug or constrain of Elasticsearch/Lucene. As the traceID field is of very high cardinality, and the index is with almost 2 Billions of docs, they looked like contributed to the problem.

ES/Lucene should support very high cardinality fields just fine, so this sounds like a possible bug.

Are many of your id values affected, or just a small subset?

Can you figure out which shard this document was routed to and zip up that entire Lucene index and post somewhere? I can pull it down and try to dig into it.

Mike McCandless

Hi, Mike

Thanks a lot for responding!

When it happened, all id values are affected. I built a test index with one shard and 21,000 sample docs, on which the problem can be reproduced. The index files can be downloaded from
https://drive.google.com/file/d/0B6UVCJccYns8VVpwYURTbTlSMGM/view?usp=sharing

Thanks, I was able to download the index ... I'll dig into it.

Mike McCandless

The index has 21,000 documents, no deletions.

It has 20,154 unique traceId (i.e. some documents share traceId).

Then I made a small java app to count number of docs for each traceId:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;

import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

// javac -cp ../build/core/lucene-core-6.2.0-SNAPSHOT.jar TraceID.java; java -cp .:../build/core/lucene-core-6.2.0-SNAPSHOT.jar TraceID

public class TraceID {
  public static void main(String[] args) throws IOException {
    Path path = Paths.get("broken");
    Directory dir = FSDirectory.open(path);
    IndexReader r = DirectoryReader.open(dir);
    System.out.println("maxDoc=" + r.maxDoc() + " numDocs=" + r.numDocs());
    IndexSearcher s = new IndexSearcher(r);
    try (BufferedReader br = new BufferedReader(new FileReader(new File("/l/62/traceIds.txt")))) {
      String line;
      while((line = br.readLine()) != null) {
        int count = s.count(new TermQuery(new Term("traceId", line.trim())));
        if (count != 1) {
          System.out.println("GOT: " + count + " for traceId=" + line.trim());
        }
      }
    }
    r.close();
    dir.close();
  }
}

And while some of the traceIds had > 1 count, none of them had count 0. So as far as I can tell, the Lucene side is working correctly.

Also, most traceIds are length 19, but some are length 15-18 as well. Is that expected?

Do you have an example traceId that fails to come back for this index?

Mike McCandless

High cardinality and docs count are red herring. After probing a bit more today, it looked the json parser used by elasticsearch rounded a json long value when converting it to a string.

Below screen shot illustrates what is the problem. The traceId in _source is already rounded. This's why the attempt to filter by traceId from source always failed. However the terms aggregation still shows the correct key.

I also tested defining traceId as long in elasticsearch. In this scenario , both source and aggregation key lost precision.

Kenny Wu

Hmm I just tested ES 5.1.2 using your first example but the value is correct (not truncated).

Can you test ES 5.1.2? Maybe there was a bug that's been fixed.

Mike McCandless

I tested ES 5.1.2 and the problem remains. Please be aware traceId from source json is not a string but of type number, ie, { "traceId": 1026314602185330712 } . It looks when parsing this long number, the precision is lost.

GET /

{
  "name": "cAewfKa",
  "cluster_name": "kenny_test",
  "cluster_uuid": "JUY_mFLJSOmwCfxCZRPdsA",
  "version": {
    "number": "5.1.2",
    "build_hash": "c8c4c16",
    "build_date": "2017-01-11T20:18:39.146Z",
    "build_snapshot": false,
    "lucene_version": "6.3.0"
  },
  "tagline": "You Know, for Search"
}

GET /test_me/_mappings

{
  "test_me": {
    "mappings": {
      "logs": {
        "properties": {
          "traceId": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

POST /test_me/logs
{
  "traceId": 1026314602185330712
}

POST /test_me/_search

 {
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_me",
        "_type": "logs",
        "_id": "AVnOGVpfDdPNLqd8e68H",
        "_score": 1,
        "_source": {
          "traceId": 1026314602185330700
        }
      }
    ]
  }
}

I changed the title of this post as the original one is misleading.

Hmm, how are you submitting your JSON requests? It works for me with ES 5.1.2 when I use curl:

curl -XPUT 'localhost:9200/test_me?pretty' -d'
{
  "mappings": {
    "logs": {
      "properties": {
        "traceId": {
	  "type": "keyword"
	}
      }
    }
  }
}
'
-->
{
  "acknowledged" : true,
  "shards_acknowledged" : true
}




curl -XPOST 'localhost:9200/test_me/logs?pretty' -d'
{
  "traceId": 1026314602185332762
}
'
-->
{
  "_index" : "test_me",
  "_type" : "logs",
  "_id" : "AVnOUV3bXuDudHHhdSuV",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}




curl -XPOST 'localhost:9200/test_me/logs/_search?pretty' -d'
{
  "aggs": {
    "test": {
      "terms": {
        "field": "traceId",
        "size": 10
      }
    }
  }
}
'
->
{
  "took" : 27,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_me",
        "_type" : "logs",
        "_id" : "AVnOUV3bXuDudHHhdSuV",
        "_score" : 1.0,
        "_source" : {
          "traceId" : 1026314602185332762
        }
      }
    ]
  },
  "aggregations" : {
    "test" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1026314602185332762",
          "doc_count" : 1
        }
      ]
    }
  }
}

Mike McCandless

I just used curl for the same request and can confirm the result is good.

Previous tests I did are under Dev console (sense) within Kibana. So this now looks to be a problem with Kibana (or the js library it used for converting json long number).

Ahh, I see ... so something in Kibana's dev console is maybe truncating the long value. I assume a workaround here is for you to make this long value a string instead?

Can you open a Kibana issue, linking to this discussion? Thanks.

Mike McCandless

Yes, I've already make this traceId a string to get around the problem. But I think Kibana needs a way to get it fixed as this not only affect Dev console but all other functionality when a long value is used for filtering and graphing. I'll raise this issue to Kibana team via Github.

Thanks very much for helping me finding the root cause!

Regards
Kenny

I also spent some time trying to figure it out. But it's just the nature of JavaScript:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.