Long JSON number lost precision

kwu · January 19, 2017, 7:11am

Elasticsearch Version:
v5.0.2

Index Info:

GET /_cat/indices/dblog-2017.01.19?v

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open dblog-2017.01.19 Lis_GE3nTVutbUZMJYXDZw 10 1 1976053299 4 5.3tb 2.6tb

Problem Description:
When searching on a keyword field traceId with TermQuery, for example:
> POST /dblog-2017.01.19/_search
> {
> "query": {
> "term": {
> "traceId": {
> "value": "6226230315557965000"
> }
> }
> },
> "_source": "traceId"
> }

It returned no hits.

If using prefix query (& wildcard query) , the doc can be matched when the value queried is less than or equal to 15 characters like below:

POST /dblog-2017.01.19/_search
{
"query": {
"prefix": {
"traceId": {
"value": "622623031555796"
}
}
},
"_source": "traceId"
}

{
      "took": 4,
      "timed_out": false,
      "_shards": {
        "total": 10,
        "successful": 10,
        "failed": 0
      },
      "hits": {
        "total": 1,
        "max_score": 1,
        "hits": [
          {
            "_index": "dblog-2017.01.19",
            "_type": "logs",
            "_id": "AVm0Cv1DARfp1s5Erl20",
            "_score": 1,
            "_source": {
              "traceId": 6226230315557965000
            }
          }
        ]
      }
    }

If I index only a few docs to a test index, the same TermQuery matches the doc correctly.

I am not sure if this is a bug or constrain of Elasticsearch/Lucene. As the traceID field is of very high cardinality, and the index is with almost 2 Billions of docs, they looked like contributed to the problem.

mikemccand · January 19, 2017, 4:47pm

ES/Lucene should support very high cardinality fields just fine, so this sounds like a possible bug.

Are many of your id values affected, or just a small subset?

Can you figure out which shard this document was routed to and zip up that entire Lucene index and post somewhere? I can pull it down and try to dig into it.

Mike McCandless

kwu · January 20, 2017, 4:58am

Hi, Mike

Thanks a lot for responding!

When it happened, all id values are affected. I built a test index with one shard and 21,000 sample docs, on which the problem can be reproduced. The index files can be downloaded from
https://drive.google.com/file/d/0B6UVCJccYns8VVpwYURTbTlSMGM/view?usp=sharing

mikemccand · January 20, 2017, 11:57am

Thanks, I was able to download the index ... I'll dig into it.

Mike McCandless

mikemccand · January 20, 2017, 1:42pm

The index has 21,000 documents, no deletions.

It has 20,154 unique traceId (i.e. some documents share traceId).

Then I made a small java app to count number of docs for each traceId:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;

import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

// javac -cp ../build/core/lucene-core-6.2.0-SNAPSHOT.jar TraceID.java; java -cp .:../build/core/lucene-core-6.2.0-SNAPSHOT.jar TraceID

public class TraceID {
  public static void main(String[] args) throws IOException {
    Path path = Paths.get("broken");
    Directory dir = FSDirectory.open(path);
    IndexReader r = DirectoryReader.open(dir);
    System.out.println("maxDoc=" + r.maxDoc() + " numDocs=" + r.numDocs());
    IndexSearcher s = new IndexSearcher(r);
    try (BufferedReader br = new BufferedReader(new FileReader(new File("/l/62/traceIds.txt")))) {
      String line;
      while((line = br.readLine()) != null) {
        int count = s.count(new TermQuery(new Term("traceId", line.trim())));
        if (count != 1) {
          System.out.println("GOT: " + count + " for traceId=" + line.trim());
        }
      }
    }
    r.close();
    dir.close();
  }
}

And while some of the traceIds had > 1 count, none of them had count 0. So as far as I can tell, the Lucene side is working correctly.

Also, most traceIds are length 19, but some are length 15-18 as well. Is that expected?

Do you have an example traceId that fails to come back for this index?

Mike McCandless

kwu · January 22, 2017, 6:30am

High cardinality and docs count are red herring. After probing a bit more today, it looked the json parser used by elasticsearch rounded a json long value when converting it to a string.

Below screen shot illustrates what is the problem. The traceId in _source is already rounded. This's why the attempt to filter by traceId from source always failed. However the terms aggregation still shows the correct key.

I also tested defining traceId as long in elasticsearch. In this scenario , both source and aggregation key lost precision.

Kenny Wu

mikemccand · January 24, 2017, 12:56am

Hmm I just tested ES 5.1.2 using your first example but the value is correct (not truncated).

Can you test ES 5.1.2? Maybe there was a bug that's been fixed.

Mike McCandless

kwu · January 24, 2017, 1:36am

I tested ES 5.1.2 and the problem remains. Please be aware traceId from source json is not a string but of type number, ie, { "traceId": 1026314602185330712 } . It looks when parsing this long number, the precision is lost.

GET /

{
  "name": "cAewfKa",
  "cluster_name": "kenny_test",
  "cluster_uuid": "JUY_mFLJSOmwCfxCZRPdsA",
  "version": {
    "number": "5.1.2",
    "build_hash": "c8c4c16",
    "build_date": "2017-01-11T20:18:39.146Z",
    "build_snapshot": false,
    "lucene_version": "6.3.0"
  },
  "tagline": "You Know, for Search"
}

GET /test_me/_mappings

{
  "test_me": {
    "mappings": {
      "logs": {
        "properties": {
          "traceId": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

POST /test_me/logs
{
  "traceId": 1026314602185330712
}

POST /test_me/_search

 {
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_me",
        "_type": "logs",
        "_id": "AVnOGVpfDdPNLqd8e68H",
        "_score": 1,
        "_source": {
          "traceId": 1026314602185330700
        }
      }
    ]
  }
}

kwu · January 24, 2017, 1:48am

I changed the title of this post as the original one is misleading.

mikemccand · January 24, 2017, 2:40am

Hmm, how are you submitting your JSON requests? It works for me with ES 5.1.2 when I use curl:

curl -XPUT 'localhost:9200/test_me?pretty' -d'
{
  "mappings": {
    "logs": {
      "properties": {
        "traceId": {
	  "type": "keyword"
	}
      }
    }
  }
}
'
-->
{
  "acknowledged" : true,
  "shards_acknowledged" : true
}




curl -XPOST 'localhost:9200/test_me/logs?pretty' -d'
{
  "traceId": 1026314602185332762
}
'
-->
{
  "_index" : "test_me",
  "_type" : "logs",
  "_id" : "AVnOUV3bXuDudHHhdSuV",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}




curl -XPOST 'localhost:9200/test_me/logs/_search?pretty' -d'
{
  "aggs": {
    "test": {
      "terms": {
        "field": "traceId",
        "size": 10
      }
    }
  }
}
'
->
{
  "took" : 27,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_me",
        "_type" : "logs",
        "_id" : "AVnOUV3bXuDudHHhdSuV",
        "_score" : 1.0,
        "_source" : {
          "traceId" : 1026314602185332762
        }
      }
    ]
  },
  "aggregations" : {
    "test" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1026314602185332762",
          "doc_count" : 1
        }
      ]
    }
  }
}

Mike McCandless

kwu · January 24, 2017, 3:26am

I just used curl for the same request and can confirm the result is good.

Previous tests I did are under Dev console (sense) within Kibana. So this now looks to be a problem with Kibana (or the js library it used for converting json long number).

mikemccand · January 24, 2017, 12:03pm

Ahh, I see ... so something in Kibana's dev console is maybe truncating the long value. I assume a workaround here is for you to make this long value a string instead?

Can you open a Kibana issue, linking to this discussion? Thanks.

Mike McCandless

kwu · January 25, 2017, 2:31am

Yes, I've already make this traceId a string to get around the problem. But I think Kibana needs a way to get it fixed as this not only affect Dev console but all other functionality when a long value is used for filtering and graphing. I'll raise this issue to Kibana team via Github.

Thanks very much for helping me finding the root cause!

Regards
Kenny

mabn0 · February 3, 2017, 2:40am

I also spent some time trying to figure it out. But it's just the nature of JavaScript:

system · March 3, 2017, 2:41am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
"term" query fails to match exact long integer; instead matches nearby long integer Elasticsearch	6	1565	July 6, 2017
Elastic Query Returning Results With doc_count Lower Than Expected Elasticsearch	3	1214	October 12, 2017
Terms aggregation with “missing” Elasticsearch	7	10621	January 10, 2018
Query doesn't show right result Elasticsearch	3	377	August 27, 2019
Query on numeric data type Elasticsearch	6	5671	April 13, 2017

Long JSON number lost precision

Related topics