Long JSON number lost precision


(Kenny Wu) #1

Elasticsearch Version:
v5.0.2

Index Info:

GET /_cat/indices/dblog-2017.01.19?v

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open dblog-2017.01.19 Lis_GE3nTVutbUZMJYXDZw 10 1 1976053299 4 5.3tb 2.6tb

Problem Description:
When searching on a keyword field traceId with TermQuery, for example:
> POST /dblog-2017.01.19/_search
> {
> "query": {
> "term": {
> "traceId": {
> "value": "6226230315557965000"
> }
> }
> },
> "_source": "traceId"
> }

It returned no hits.

If using prefix query (& wildcard query) , the doc can be matched when the value queried is less than or equal to 15 characters like below:

POST /dblog-2017.01.19/_search
{
"query": {
"prefix": {
"traceId": {
"value": "622623031555796"
}
}
},
"_source": "traceId"
}

{
      "took": 4,
      "timed_out": false,
      "_shards": {
        "total": 10,
        "successful": 10,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 1,
        "hits": [
          {
            "_index": "dblog-2017.01.19",
            "_type": "logs",
            "_id": "AVm0Cv1DARfp1s5Erl20",
            "_score": 1,
            "_source": {
              "traceId": 6226230315557965000
            }
          }
        ]
      }
    }

If I index only a few docs to a test index, the same TermQuery matches the doc correctly.

I am not sure if this is a bug or constrain of Elasticsearch/Lucene. As the traceID field is of very high cardinality, and the index is with almost 2 Billions of docs, they looked like contributed to the problem.


(Michael McCandless) #2

ES/Lucene should support very high cardinality fields just fine, so this sounds like a possible bug.

Are many of your id values affected, or just a small subset?

Can you figure out which shard this document was routed to and zip up that entire Lucene index and post somewhere? I can pull it down and try to dig into it.

Mike McCandless


(Kenny Wu) #3

Hi, Mike

Thanks a lot for responding!

When it happened, all id values are affected. I built a test index with one shard and 21,000 sample docs, on which the problem can be reproduced. The index files can be downloaded from
https://drive.google.com/file/d/0B6UVCJccYns8VVpwYURTbTlSMGM/view?usp=sharing


(Michael McCandless) #4

Thanks, I was able to download the index ... I'll dig into it.

Mike McCandless


(Michael McCandless) #5

The index has 21,000 documents, no deletions.

It has 20,154 unique traceId (i.e. some documents share traceId).

Then I made a small java app to count number of docs for each traceId:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;

import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

// javac -cp ../build/core/lucene-core-6.2.0-SNAPSHOT.jar TraceID.java; java -cp .:../build/core/lucene-core-6.2.0-SNAPSHOT.jar TraceID

public class TraceID {
  public static void main(String[] args) throws IOException {
    Path path = Paths.get("broken");
    Directory dir = FSDirectory.open(path);
    IndexReader r = DirectoryReader.open(dir);
    System.out.println("maxDoc=" + r.maxDoc() + " numDocs=" + r.numDocs());
    IndexSearcher s = new IndexSearcher(r);
    try (BufferedReader br = new BufferedReader(new FileReader(new File("/l/62/traceIds.txt")))) {
      String line;
      while((line = br.readLine()) != null) {
        int count = s.count(new TermQuery(new Term("traceId", line.trim())));
        if (count != 1) {
          System.out.println("GOT: " + count + " for traceId=" + line.trim());
        }
      }
    }
    r.close();
    dir.close();
  }
}

And while some of the traceIds had > 1 count, none of them had count 0. So as far as I can tell, the Lucene side is working correctly.

Also, most traceIds are length 19, but some are length 15-18 as well. Is that expected?

Do you have an example traceId that fails to come back for this index?

Mike McCandless


(Kenny Wu) #6

High cardinality and docs count are red herring. After probing a bit more today, it looked the json parser used by elasticsearch rounded a json long value when converting it to a string.

Below screen shot illustrates what is the problem. The traceId in _source is already rounded. This's why the attempt to filter by traceId from source always failed. However the terms aggregation still shows the correct key.

I also tested defining traceId as long in elasticsearch. In this scenario , both source and aggregation key lost precision.

Kenny Wu


(Michael McCandless) #7

Hmm I just tested ES 5.1.2 using your first example but the value is correct (not truncated).

Can you test ES 5.1.2? Maybe there was a bug that's been fixed.

Mike McCandless


(Kenny Wu) #8

I tested ES 5.1.2 and the problem remains. Please be aware traceId from source json is not a string but of type number, ie, { "traceId": 1026314602185330712 } . It looks when parsing this long number, the precision is lost.

GET /

{
  "name": "cAewfKa",
  "cluster_name": "kenny_test",
  "cluster_uuid": "JUY_mFLJSOmwCfxCZRPdsA",
  "version": {
    "number": "5.1.2",
    "build_hash": "c8c4c16",
    "build_date": "2017-01-11T20:18:39.146Z",
    "build_snapshot": false,
    "lucene_version": "6.3.0"
  },
  "tagline": "You Know, for Search"
}

GET /test_me/_mappings

{
  "test_me": {
    "mappings": {
      "logs": {
        "properties": {
          "traceId": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

POST /test_me/logs
{
  "traceId": 1026314602185330712
}

POST /test_me/_search

 {
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_me",
        "_type": "logs",
        "_id": "AVnOGVpfDdPNLqd8e68H",
        "_score": 1,
        "_source": {
          "traceId": 1026314602185330700
        }
      }
    ]
  }
}

(Kenny Wu) #9

I changed the title of this post as the original one is misleading.


(Michael McCandless) #10

Hmm, how are you submitting your JSON requests? It works for me with ES 5.1.2 when I use curl:

curl -XPUT 'localhost:9200/test_me?pretty' -d'
{
  "mappings": {
    "logs": {
      "properties": {
        "traceId": {
	  "type": "keyword"
	}
      }
    }
  }
}
'
-->
{
  "acknowledged" : true,
  "shards_acknowledged" : true
}




curl -XPOST 'localhost:9200/test_me/logs?pretty' -d'
{
  "traceId": 1026314602185332762
}
'
-->
{
  "_index" : "test_me",
  "_type" : "logs",
  "_id" : "AVnOUV3bXuDudHHhdSuV",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}




curl -XPOST 'localhost:9200/test_me/logs/_search?pretty' -d'
{
  "aggs": {
    "test": {
      "terms": {
        "field": "traceId",
        "size": 10
      }
    }
  }
}
'
->
{
  "took" : 27,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_me",
        "_type" : "logs",
        "_id" : "AVnOUV3bXuDudHHhdSuV",
        "_score" : 1.0,
        "_source" : {
          "traceId" : 1026314602185332762
        }
      }
    ]
  },
  "aggregations" : {
    "test" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1026314602185332762",
          "doc_count" : 1
        }
      ]
    }
  }
}

Mike McCandless


(Kenny Wu) #11

I just used curl for the same request and can confirm the result is good.

Previous tests I did are under Dev console (sense) within Kibana. So this now looks to be a problem with Kibana (or the js library it used for converting json long number).


(Michael McCandless) #12

Ahh, I see ... so something in Kibana's dev console is maybe truncating the long value. I assume a workaround here is for you to make this long value a string instead?

Can you open a Kibana issue, linking to this discussion? Thanks.

Mike McCandless


(Kenny Wu) #13

Yes, I've already make this traceId a string to get around the problem. But I think Kibana needs a way to get it fixed as this not only affect Dev console but all other functionality when a long value is used for filtering and graphing. I'll raise this issue to Kibana team via Github.

Thanks very much for helping me finding the root cause!

Regards
Kenny


(Marcin Biegan) #14

I also spent some time trying to figure it out. But it's just the nature of JavaScript:


(system) #15

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.