Filter by comparing two number fields fails with number_format_exception

I would like to filter documents in Kibana Visualize where msg.count_passenger_out > msg.count_passenger_in. Both fields are number fields and are indexed.

Im running Kibana + Elasticsearch v7.6.0 at Elastic Cloud.

This query work perfectly msg.count_passenger_out > 10, however I get an ugly, unformatted exception In Kibana when I try executing msg.count_passenger_out > msg.count_passenger_in in Kibana:

{
  "error": {
    "root_cause": [
      {
        "type": "query_shard_exception",
        "reason": "failed to create query: {\n \"bool\" : {\n \"filter\" : [\n {\n \"bool\" : {\n \"should\" : [\n {\n \"range\" : {\n \"msg.count_passenger_out\" : {\n \"from\" : \"msg.count_passenger_in\",\n \"to\" : null,\n \"include_lower\" : false,\n \"include_upper\" : true,\n \"boost\" : 1.0\n }\n }\n }\n ],\n \"adjust_pure_negative\" : true,\n \"minimum_should_match\" : \"1\",\n \"boost\" : 1.0\n }\n },\n {\n \"bool\" : {\n \"should\" : [\n {\n \"range\" : {\n \"msg.count_passenger_out\" : {\n \"from\" : \"msg.count_passenger_in\",\n \"to\" : null,\n \"include_lower\" : false,\n \"include_upper\" : true,\n \"boost\" : 1.0\n }\n }\n }\n ],\n \"adjust_pure_negative\" : true,\n \"minimum_should_match\" : \"1\",\n \"boost\" : 1.0\n }\n },\n {\n \"range\" : {\n \"msg.@timestamp\" : {\n \"from\" : null,\n \"to\" : null,\n \"include_lower\" : true,\n \"include_upper\" : true,\n \"boost\" : 1.0\n }\n }\n }\n ],\n \"adjust_pure_negative\" : true,\n \"boost\" : 1.0\n }\n}",
        "index_uuid": "YFS2X_8JQMOVzygXTQPjMQ",
        "index": "logs-2020.03.05"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "logs-2020.03.05",
        "node": "gTASgYfOSg2I4NRpAe3fiw",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: {\n \"bool\" : {\n \"filter\" : [\n {\n \"bool\" : {\n \"should\" : [\n {\n \"range\" : {\n \"msg.count_passenger_out\" : {\n \"from\" : \"msg.count_passenger_in\",\n \"to\" : null,\n \"include_lower\" : false,\n \"include_upper\" : true,\n \"boost\" : 1.0\n }\n }\n }\n ],\n \"adjust_pure_negative\" : true,\n \"minimum_should_match\" : \"1\",\n \"boost\" : 1.0\n }\n },\n {\n \"bool\" : {\n \"should\" : [\n {\n \"range\" : {\n \"msg.count_passenger_out\" : {\n \"from\" : \"msg.count_passenger_in\",\n \"to\" : null,\n \"include_lower\" : false,\n \"include_upper\" : true,\n \"boost\" : 1.0\n }\n }\n }\n ],\n \"adjust_pure_negative\" : true,\n \"minimum_should_match\" : \"1\",\n \"boost\" : 1.0\n }\n },\n {\n \"range\" : {\n \"msg.@timestamp\" : {\n \"from\" : null,\n \"to\" : null,\n \"include_lower\" : true,\n \"include_upper\" : true,\n \"boost\" : 1.0\n }\n }\n }\n ],\n \"adjust_pure_negative\" : true,\n \"boost\" : 1.0\n }\n}",
          "index_uuid": "YFS2X_8JQMOVzygXTQPjMQ",
          "index": "logs-2020.03.05",
          "caused_by": {
            "type": "number_format_exception",
            "reason": "For input string: \"msg.count_passenger_in\""
          }
        }
      }
    ]
  },
  "status": 400
}

Filters in Kibana are not able to do that out of the box, they can only filter by constant values.

To work around this issue, you can create a scripted field in your index pattern doing the comparison and being either true or false and then filter on that field in your discover view.

The downside of this approach is that performance might be impacted because Elasticsearch has to look up each single document to check whether the condition matches and can't rely on pre-built inverted indices. To get maximum performance, it makes sense to do the comparison before ingesting the data to elasticsearch and saving it in a separate field.

Thanks for quick reply, @flash1293! Do you know if this limitation is by design (and the reason for it) or is it a missing feature? I have previously experience with Splunk and this was possible for many years.

You will get more context in the Elasticsearch forum if you are interested in details but AFAIK that’s by design. Queries are meant to be fast, even on vast amounts of data. If the thing you are searching for is a static value, Elasticsearch can pre-compute an inverted index for that value and searching will be super fast because Elasticsearch just has to check in a single place to see which documents match.

If you are specifying this comparison of two fields in the same document on the fly, Elasticsearch has to go through every single document and compare the values to figure out whether they match the condition - much slower on large amounts of data. It’s possible (by using scripted fields) but discouraged.

Thanks, that explanation was quite useful!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.