Result from ES|QL differs from result of regular search

This regular query:

GET /twitter20230601-times/_search
{
  "_source": ["counts"]
}

results:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "twitter20230601-times",
        "_id": "zomfv5gB5TGBsgBW5jx1",
        "_score": 1,
        "_source": {
          "counts": [
            5000,
            5000,
            5000,
            5000,
            5000,
            5000,
            5000,
            5000,
            4933
          ]
        }
      }
    ]
  }
}

This ES|QL query:

POST /_query
{
  "query": """
    FROM twitter20230601-times | KEEP counts
    """
}

results:

{
  "took": 5,
  "is_partial": false,
  "documents_found": 1,
  "values_loaded": 9,
  "columns": [
    {
      "name": "counts",
      "type": "long"
    }
  ],
  "values": [
    [
      [
        4933,
        5000,
        5000,
        5000,
        5000,
        5000,
        5000,
        5000,
        5000
      ]
    ]
  ]
}

The return values for the ES|QL query are in reverse order. This is wrong. The regular query returns the values in the correct order.

Is this a bug, or is there something about ES|QL I don’t understand?

Hi @peter9 Welcome to the community

What version are you using?

And it would seem there is only 1 doc?

Can you run the following and show the results

GET /twitter20230601-times/_search
{
  "_source": ["counts"],
  "fields" : ["counts"]
}

Can you also share the settings and mapping for this index

GET /twitter20230601-times

1 Like

Elasticsearch version 9.1.2

There is only one doc:

{
  "times": [
    3.8873046666666773,
    8.715069388888871,
    10.492110111111103,
    12.25434538888891,
    13.760250666666694,
    15.50071144444448,
    17.63239077777775,
    19.57214750000005,
    21.168840236953013
  ],
  "counts": [
    5000,
    5000,
    5000,
    5000,
    5000,
    5000,
    5000,
    5000,
    4933
  ]
}

query:

GET /twitter20230601-times/_search
{
  "_source": ["counts"],
  "fields" : ["counts"]
}

result:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "twitter20230601-times",
        "_id": "zomfv5gB5TGBsgBW5jx1",
        "_score": 1,
        "_source": {
          "counts": [
            5000,
            5000,
            5000,
            5000,
            5000,
            5000,
            5000,
            5000,
            4933
          ]
        },
        "fields": {
          "counts": [
            5000,
            5000,
            5000,
            5000,
            5000,
            5000,
            5000,
            5000,
            4933
          ]
        }
      }
    ]
  }
}

query:

GET /twitter20230601-times

results:

{
  "twitter20230601-times": {
    "aliases": {},
    "mappings": {
      "properties": {
        "counts": {
          "type": "long"
        },
        "times": {
          "type": "float"
        }
      }
    },
    "settings": {
      "index": {
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_content"
            }
          }
        },
        "number_of_shards": "1",
        "provided_name": "twitter20230601-times",
        "creation_date": "1755561194680",
        "number_of_replicas": "1",
        "uuid": "snbgIAsjSiqbjrizPYU8CQ",
        "version": {
          "created": "9033000"
        }
      }
    }
  }
}

Hi @peter9

Unfortunately not a bug...

see here
"The relative order of values in a multivalued field is undefined. They’ll frequently be in ascending order but don’t rely on that."

As far as I know, non-ESQL does not modify MV fields in any way. I tested it a few times by changing the field type and got the same results each time, so I believe that's true.

I suspect you are trying to keep the times and counts aligned...

If you want to use ESQL I think you may need to reconsider how you store your data...

Either something like below or as individuals documents...

DELETE twitter20230601-times
PUT twitter20230601-times
{
  "mappings": {
    "properties": {
      "combos": {
        "type": "keyword"
      }
    }
  }
}



POST twitter20230601-times/_doc
{
  "combos": [
    "3.887304666666677, 5000",
    "8.715069388888871, 5000",
    "10.49211011111110, 4999",
    "12.25434538888891, 5000",
    "13.76025066666669, 5000",
    "15.50071144444448, 5033",
    "17.63239077777775, 5000",
    "19.57214750000005, 5000",
    "2.168840236953013, 4944"
  ]
}


POST /_query?format=txt
{
  "query": """
    FROM twitter20230601-times 
    | MV_EXPAND combos
    | DISSECT combos "%{ts}, %{cnt}"
    | EVAL timestamp = TO_DOUBLE(ts), count = TO_INTEGER(cnt)
    | KEEP timestamp, count, combos
    | SORT timestamp ASC
       """
}


# Result
#! No limit defined, adding default limit of [1000]
    timestamp    |     count     |        combos         
-----------------+---------------+-----------------------
2.168840236953013|4944           |2.168840236953013, 4944
3.887304666666677|5000           |3.887304666666677, 5000
8.715069388888871|5000           |8.715069388888871, 5000
10.4921101111111 |4999           |10.49211011111110, 4999
12.25434538888891|5000           |12.25434538888891, 5000
13.76025066666669|5000           |13.76025066666669, 5000
15.50071144444448|5033           |15.50071144444448, 5033
17.63239077777775|5000           |17.63239077777775, 5000
19.57214750000005|5000           |19.57214750000005, 5000

It has nothing to do with alignment between the two lists. Order in both lists is important.

There is only one value for counts. That value is a list. It’s not a bag or a set or a bunch of records. Order in a list has meaning in json.

Just providing how ESQL works -

"The relative order of values in a multivalued field is undefined. They’ll frequently be in ascending order but don’t rely on that."

I confirmed that with engineering.

Apologies for the assumptions. I just ended up playing with the data

Perhaps ESQL is not a good fit for your use case.

Of course I can work around this, with multiple documents:

{ index: 0, count: 5000, time: 3.56 }
{ index: 1, count: 5000, time: 7.36 }

etc.

Using one document with two lists seemed more effecient to me, because I always need the whole lists.

Is this because ESQL works on the indexed data (where order of multivalued fields is not kept) and does not retrieve data from source (which can be very expensive)?

1 Like

:slight_smile: "Manual Ordered List"

DELETE twitter20230601-times
PUT twitter20230601-times
{
  "mappings": {
    "properties": {
      "ordered_list": {
        "type": "keyword"
      }
    }
  }
}

POST twitter20230601-times/_doc
{
  "ordered_list": [
    "0, 3.887304666666677, 5000",
    "1, 8.715069388888871, 5000",
    "2, 10.49211011111110, 4999",
    "3, 12.25434538888891, 5000",
    "4, 13.76025066666669, 5000",
    "5, 15.50071144444448, 5033",
    "6, 17.63239077777775, 5000",
    "7, 19.57214750000005, 5000",
    "8, 2.168840236953013, 4944"
  ]
}


POST /_query?format=txt
{
  "query": """
    FROM twitter20230601-times 
    | MV_EXPAND ordered_list
    | DISSECT ordered_list "%{index}, %{timestamp}, %{count}"
    | EVAL index = TO_LONG(index), timestamp = TO_DOUBLE(timestamp), count = TO_INTEGER(count)
    | SORT index ASC
    | KEEP index, timestamp, count
    """
}

POST /_query?format=txt
{
  "query": """
    FROM twitter20230601-times 
    | MV_EXPAND ordered_list
    | DISSECT ordered_list "%{index}, %{timestamp}, %{count}"
    | EVAL index = TO_LONG(index), timestamp = TO_DOUBLE(timestamp), count = TO_INTEGER(count)
    | SORT index ASC
    | KEEP index, timestamp, count
    """
}



POST /_query
{
  "query": """
    FROM twitter20230601-times 
    | MV_EXPAND ordered_list
    | DISSECT ordered_list "%{index}, %{timestamp}, %{count}"
    | EVAL index = TO_LONG(index), timestamp = TO_DOUBLE(timestamp), count = TO_INTEGER(count)
    | SORT index ASC
    | KEEP index, timestamp, count
    """
}


# 280: POST /_query?format=txt [200 OK]
#! No limit defined, adding default limit of [1000]
     index     |    timestamp    |     count     
---------------+-----------------+---------------
0              |3.887304666666677|5000           
1              |8.715069388888871|5000           
2              |10.4921101111111 |4999           
3              |12.25434538888891|5000           
4              |13.76025066666669|5000           
5              |15.50071144444448|5033           
6              |17.63239077777775|5000           
7              |19.57214750000005|5000           
8              |2.168840236953013|4944           

# 292: POST /_query [200 OK]
#! No limit defined, adding default limit of [1000]
{
  "took": 10,
  "is_partial": false,
  "documents_found": 1,
  "values_loaded": 9,
  "columns": [
    {
      "name": "index",
      "type": "long"
    },
    {
      "name": "timestamp",
      "type": "double"
    },
    {
      "name": "count",
      "type": "integer"
    }
  ],
  "values": [
    [
      0,
      3.887304666666677,
      5000
    ],
    [
      1,
      8.715069388888871,
      5000
    ],
    [
      2,
      10.4921101111111,
      4999
    ],
    [
      3,
      12.25434538888891,
      5000
    ],
    [
      4,
      13.76025066666669,
      5000
    ],
    [
      5,
      15.50071144444448,
      5033
    ],
    [
      6,
      17.63239077777775,
      5000
    ],
    [
      7,
      19.57214750000005,
      5000
    ],
    [
      8,
      2.168840236953013,
      4944
    ]
  ]
}