Querying data using script query with lang=expression for remainder operation

hi Folks,

We store a field of type "long" in elastic which has value of this nature = 9048716794795431
Need to retrieve these records based on modulus expression. So the query looks like this

query: {'query': {'bool': {'must': [{'script': {'script': {'source': "doc['hash'].value % 21 == 13", 'lang': 'expression'}}}]}}}

Now as you can see the actual mod of 9048716794795431 % 21 = 12, but when i run the above elastic query, this particular record is retrieved as though the (9048716794795431%21= 13).

I am not sure if elastic has a different way to storing the long type field and should this query be modified to specify long or not sure what ? This is python code. So does it need to be handled separately ?
Also, just to mention, the above problem is only seen for a few particular records and does not happen for all.

some sample values of long field are = [9475189089572885, 9584341227278131...etc] where it retrieves the correct records based on mod operation with 21resulting different numbers from [0-20]

looking for some insights here.
Thanks.

This might have to do with how the numbers are being parsed. If you take your example: 9048716794795431 % 21, and run it through Java:

import java.util.*;
public class println{
    public static void main (String [] args){
        System.out.println(9048716794795431L % 21);
    }
}

You will get the output of 12.

However, if either of the numbers above get represented as a float (9048716794795431.0 or 21.0), then the return becomes 13.0. Example:

import java.util.*;
public class println{
    public static void main (String [] args){
        System.out.println(9048716794795431.0 % 21);
    }
}

In your script, you might want to explicitly declare/cast your value as a long, and see if you're able to get the correct result of 12.

1 Like

how do you explicitly typecast to long in the query above ?
query: {'query': {'bool': {'must': [{'script': {'script': {'source': "doc['hash'].value % 21 == 13", 'lang': 'expression'}}}]}}}

Hi @Rachana_Maniyar Curious if you have tried a runtime field if you get the same results?

What version are you on?

I will try as a runtime field and see if a can make it work correctly

If you have mapped it as a long and indexed the document with the field set to such a large number, it is possible that you have hit the limit of numeric accuracy in JSON.. Try sending in a test document with "hash": "9048716794795431" (string in JSON that will be parsed to long in Elasticsearch) and see if this behaves differently.

1 Like

Ok WOW this is really interesting.

@Christian_Dahlqvist @Rachana_Maniyar Look at the input and output the numbers are not the same!

DELETE discuss-test

PUT /discuss-test
{
  "mappings": {
    "properties": {
      "name" : {"type" : "keyword"},
      "hash" : {"type" : "long"}
    }
  }
}


POST discuss-test/_doc
{
  "name" : "test1",
  "hash" : 9048716794795431
}

POST discuss-test/_doc
{
  "name" : "test2",
  "hash" : 9475189089572885
}


POST discuss-test/_doc
{
  "name" : "test3",
  "hash" : 9584341227278131
}

GET discuss-test/_search
{
  "fields": [ "*"]
}


# Results

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "discuss-test",
        "_id": "Z-0ta4YBNTELl2kaGbSs",
        "_score": 1,
        "_source": {
          "name": "test1",
          "hash": 9048716794795432
        },
        "fields": {
          "name": [
            "test1"
          ],
          "hash": [
            9048716794795432
          ]
        }
      },
      {
        "_index": "discuss-test",
        "_id": "aO0ta4YBNTELl2kaGbS4",
        "_score": 1,
        "_source": {
          "name": "test2",
          "hash": 9475189089572884
        },
        "fields": {
          "name": [
            "test2"
          ],
          "hash": [
            9475189089572884
          ]
        }
      },
      {
        "_index": "discuss-test",
        "_id": "ae0ta4YBNTELl2kaGbTE",
        "_score": 1,
        "_source": {
          "name": "test3",
          "hash": 9584341227278132
        },
        "fields": {
          "name": [
            "test3"
          ],
          "hash": [
            9584341227278132
          ]
        }
      }
    ]
  }
}
1 Like

And Now with Strings!

Note the string _source looks correct but the field has the incorrect value..

Those Numbers should fit in long

DELETE discuss-test

PUT /discuss-test
{
  "mappings": {
    "properties": {
      "name" : {"type" : "keyword"},
      "hash" : {"type" : "long"}
    }
  }
}


POST discuss-test/_doc
{
  "name" : "test1",
  "hash" : "9048716794795431"
}

POST discuss-test/_doc
{
  "name" : "test2",
  "hash" : "9475189089572885"
}


POST discuss-test/_doc
{
  "name" : "test3",
  "hash" : "9584341227278131"
}

GET discuss-test/_search
{
  "fields": [ "*"]
}

# Results
{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "discuss-test",
        "_id": "ce01a4YBNTELl2kaibQt",
        "_score": 1,
        "_source": {
          "name": "test1",
          "hash": "9048716794795431" <!---- Correct 
        },
        "fields": {
          "name": [
            "test1"
          ],
          "hash": [
            9048716794795432 <!--- Incorrect etc
          ]
        }
      },
      {
        "_index": "discuss-test",
        "_id": "cu01a4YBNTELl2kaibQ6",
        "_score": 1,
        "_source": {
          "name": "test2",
          "hash": "9475189089572885"
        },
        "fields": {
          "name": [
            "test2"
          ],
          "hash": [
            9475189089572884
          ]
        }
      },
      {
        "_index": "discuss-test",
        "_id": "c-01a4YBNTELl2kaibRH",
        "_score": 1,
        "_source": {
          "name": "test3",
          "hash": "9584341227278131"
        },
        "fields": {
          "name": [
            "test3"
          ],
          "hash": [
            9584341227278132
          ]
        }
      }
    ]
  }
}
1 Like

And even Weirder... Now the modulus seems to work correctly when I use the second version i.e. putting strings in for the hash it did not work correctly when I used the first version without the hash in quotes.

@DavidTurner Sorry to bother you but do have any insight on this?

By the way, I did this in version 8.6

we are on 7.x version.
with the experiements that you ran, what do we conclude ? ES cannot handle long fields or is there a bug with long fields for computation part ?
Also, how do you explicitly typecast to long in the script query (python language) and if that would be possible, please let me know.

I am asking for another opinion... BUT

My conclusion is that elasticsearch is in fact handling the data properly and this issue is related to the JSON / Javascript parsing issue of some sort like what @Christian_Dahlqvist mentioned earlier.

In short i would recommend wrapping your hashes in double quotes or make sure there is no JSON / Javascript parsing going on when you ingest your data or on the display side.

At the bottom, I show it can work without wrapping the hashes in double quotes but I would recommend it anyways so that the _source version of the hashes looks correct in browser-based apps.

Here are a couple of experiments to show you, see if it makes sense...

Basically, the data is not shown correctly in the Browser, Dev Tools, Discover etc. and if you are ingesting the longs that are too long for the JSON / Javascript they are getting rounded/truncated.

All The above code was done through the Kibana - Dev Tool

But now let's try something else

So now in Kibana Dev Tools

I created a mapping with a runtime field doing the modulus

DELETE discuss-test

PUT /discuss-test
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "hash": {
        "type": "long"
      }
    },
    "runtime": {
      "modulus": {
        "type": "long",
        "script": {
          "source": "long modulus = (doc['hash'].value) % 21; emit(modulus);"
        }
      }
    }
  }
}

Now I added in the Kibana Dev Tools with the hashes double quoted and then run the query.
Note The modulus is correct but the field value still looks wrong as it is getting processed by the JSON / JScript in the browser... Kibvana Dev-Tool

POST discuss-test/_doc
{
  "name" : "test1",
  "hash" : "9048716794795431"
}

POST discuss-test/_doc
{
  "name" : "test2",
  "hash" : "9475189089572885"
}


POST discuss-test/_doc
{
  "name" : "test3",
  "hash" : "9584341227278131"
}

GET discuss-test/_search
{
  "fields": [ "*"]
}
   
# Result Notice the Correct Moduls BUT the field still looks wrong because the JSON parser is truncating 

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "discuss-test",
        "_id": "fO36bIYBNTELl2kaPLS-",
        "_score": 1,
        "_source": {
          "name": "test1",
          "hash": "9048716794795431"
        },
        "fields": {
          "name": [
            "test1"
          ],
          "modulus": [
            12 <!---- CORRECT 
          ],
          "hash": [
            9048716794795432 <!- Looks wrong but this is the JSON / Javascript Parser. 
          ]
        }
      },
      {
        "_index": "discuss-test",
        "_id": "fe36bIYBNTELl2kaPLTS",
        "_score": 1,
        "_source": {
          "name": "test2",
          "hash": "9475189089572885"
        },
        "fields": {
          "name": [
            "test2"
          ],
          "modulus": [
            14
          ],
          "hash": [
            9475189089572884
          ]
        }
      },
      {
        "_index": "discuss-test",
        "_id": "fu36bIYBNTELl2kaPLTi",
        "_score": 1,
        "_source": {
          "name": "test3",
          "hash": "9584341227278131"
        },
        "fields": {
          "name": [
            "test3"
          ],
          "modulus": [
            13
          ],
          "hash": [
            9584341227278132
          ]
        }
      }
    ]
  }
}

But if I run the exact same query directly from my command line all so no JSON / JavaScript Parsing At ALL the data ALL looks correct.

$ curl -H "Content-Type: application/json" localhost:9200/discuss-test/_search?pretty -d '{"fields" : ["*"]}'
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "discuss-test",
        "_id" : "fO36bIYBNTELl2kaPLS-",
        "_score" : 1.0,
        "_source" : {
          "name" : "test1",
          "hash" : "9048716794795431"
        },
        "fields" : {
          "name" : [
            "test1"
          ],
          "modulus" : [
            12  <!---- CORRECT
          ],
          "hash" : [
            9048716794795431  <!---- CORRECT
          ]
        }
      },
      {
        "_index" : "discuss-test",
        "_id" : "fe36bIYBNTELl2kaPLTS",
        "_score" : 1.0,
        "_source" : {
          "name" : "test2",
          "hash" : "9475189089572885"
        },
        "fields" : {
          "name" : [
            "test2"
          ],
          "modulus" : [
            14
          ],
          "hash" : [
            9475189089572885
          ]
        }
      },
      {
        "_index" : "discuss-test",
        "_id" : "fu36bIYBNTELl2kaPLTi",
        "_score" : 1.0,
        "_source" : {
          "name" : "test3",
          "hash" : "9584341227278131"
        },
        "fields" : {
          "name" : [
            "test3"
          ],
          "modulus" : [
            13
          ],
          "hash" : [
            9584341227278131
          ]
        }
      }
    ]
  }
}

What makes this even MORE confusing (but also confirms the elasticsearch is IN FACT handling the data correctly) if I POST the documents directly from the command line with NO double quotes that Also works... this is consistent as well as there is no JSON parsing just a direct post of data.

I cleaned up the index and re POSTED the mapping above.

see All This

Post a few docs without the hash double quotes ...

hyperion:~ sbrown$ curl -X POST -H "Content-Type: application/json" http://localhost:9200/discuss-test/_doc -d '{ "name" : "test1", "hash" : 9048716794795431 }'

hyperion:~ sbrown$ curl -X POST -H "Content-Type: application/json" http://localhost:9200/discuss-test/_doc -d '{ "name" : "test2", "hash" : 9475189089572885 }'

hyperion:~ sbrown$ curl -X POST -H "Content-Type: application/json" http://localhost:9200/discuss-test/_doc -d '{ "name" : "test3", "hash" : 9584341227278131 }'

Now run the search from the command line and everything looks as expected

hyperion:~ sbrown$ curl -H "Content-Type: application/json" localhost:9200/discuss-test/_search?pretty -d '{"fields" : ["*"]}'
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "discuss-test",
        "_id" : "f-0IbYYBNTELl2kaErS2",
        "_score" : 1.0,
        "_source" : {
          "name" : "test1",
          "hash" : 9048716794795431
        },
        "fields" : {
          "name" : [
            "test1"
          ],
          "modulus" : [
            12
          ],
          "hash" : [
            9048716794795431
          ]
        }
      },
      {
        "_index" : "discuss-test",
        "_id" : "gO0IbYYBNTELl2kaMLT3",
        "_score" : 1.0,
        "_source" : {
          "name" : "test2",
          "hash" : 9475189089572885
        },
        "fields" : {
          "name" : [
            "test2"
          ],
          "modulus" : [
            14
          ],
          "hash" : [
            9475189089572885
          ]
        }
      },
      {
        "_index" : "discuss-test",
        "_id" : "ge0IbYYBNTELl2kaS7Qj",
        "_score" : 1.0,
        "_source" : {
          "name" : "test3",
          "hash" : 9584341227278131
        },
        "fields" : {
          "name" : [
            "test3"
          ],
          "modulus" : [
            13
          ],
          "hash" : [
            9584341227278131
          ]
        }
      }
    ]
  }
}
hyperion:~ sbrown$ 

So this is all correct

But if I go into Dev Tools it still looks wrong... even though it is correct :frowning:

Wrap the hashes in double quotes that'll make the visual more correct.

And if you really wanted to, you couldn't use a multi-valued field with both a keyword and a long so that you could always see it correctly in the fields as well... If you want to do that let me know and I'll show you

I know this is long but I hope it make sense! :slight_smile:

1 Like

Thank you Stephen for the extremely detailed and well explained answer. Really appreciate it.
I know for a fact that elastic is storing the correct value in the long field. The more so problem that I am looking to get an answer for is I need to retrieve the results from elastic based on the resultant of (hash%x). so the workaround that you are suggesting, either make that field as string or store the modulus value as another field, both are not workable in my usecase. is it not possible to typecast as below ? I tried running both the version but no avail.

{'query': {'bool': {'must': [{'script': {'script': {'source': "long modulus = (doc['hash'].value) % 21; modulus % 21 == 12", 'lang': 'expression'}}}]}}}

or

{'query': {'bool': {'must': [{'script': {'script': {'source': "long modulus = (doc['hash'].value) % 21; emit(modulus); modulus % 21 == 12", 'lang': 'expression'}}}]}}}

I reread your reply, are you suggesting i use the runtime in my mapping file as described by you and when retrieving the results use {'query': {'bool': {'must': [{'script': {'script': {'source': "modulus == 13", 'lang': 'expression'}}}]}}}

what happens in future if 21 changes to some x number ? how would the above script added in mapping file work ?

Hi, @Rachana_Maniyar You are Welcome... I kinda get stuck on these weird things :slight_smile: until we figure them out.

I got this to work :slight_smile:
Note you should abstract parameters for better performance

GET discuss-test/_search
{
  "fields": [
    "*"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "script": {
            "script": {
              "source": "long modulus = (doc['hash'].value) % params.mod_value; return modulus == params.mod_compare;",
              "params": {
                "mod_value": 21,
                "mod_compare": 13
              }
            }
          }
        }
      ]
    }
  }
}

# Results Correct YAY \o/

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "discuss-test",
        "_id": "hO2qb4YBNTELl2ka57TO",
        "_score": 1,
        "_source": {
          "name": "test3",
          "hash": "9584341227278131"
        },
        "fields": {
          "name": [
            "test3"
          ],
          "modulus": [
            13
          ],
          "hash": [
            9584341227278132
          ]
        }
      }
    ]
  }
}

I was not necessarily suggesting runtime fields, just that they are the new scripted fields but you are really doing a scripted search BUT that said runtime mappings can be changed on live indices so if you needed to change it you could.

Also here is the script with the new fields API

This yields the same correct results and is safe
If I add a doc with no hash field the first script will fail this will not.

GET discuss-test/_search
{
  "fields": [
    "*"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "script": {
            "script": {
              "source": "long modulus = field('hash').get(0) % params.mod_value; return modulus == params.mod_compare;",
              "params": {
                "mod_value": 21,
                "mod_compare": 13
              }
            }
          }
        }
      ]
    }
  }
}

# Results

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "discuss-test",
        "_id": "hO2qb4YBNTELl2ka57TO",
        "_score": 1,
        "_source": {
          "name": "test3",
          "hash": "9584341227278131"
        },
        "fields": {
          "name": [
            "test3"
          ],
          "modulus": [
            13
          ],
          "hash": [
            9584341227278132
          ]
        }
      }
    ]
  }
}

Thanks again for the explanation.
can this script take OR clause as part of "source" ?

"script": {
            "script": {
              "source": "long modulus = field('hash').get(0) % params.mod_value; return modulus == params.mod_compare;",
              "params": {
                "mod_value": 21,
                "mod_compare": 13
              }
            }
          }

something of this nature:

{"query":{"bool":{"must":[{"script":{"script":{"source":"doc['hash'].value % 21 == 11 | doc['hash'].value % 21 == 12","lang":"expression"}}}]}}}

The below works ....

BTW I do not think this is correct
"lang":"expression"

That seems to break the query for me leave it out or set "lang": "painless"

And to be clear NOT using the params leads to very bad performance... and may even fail at scale see here

GET discuss-test/_search
{
  "fields": [
    "*"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "script": {
            "script": {
              "source": "long modulus = field('hash').get(0) % params.mod_value; return (modulus == params.mod_compare_13 || modulus == params.mod_compare_12);",
              "params": {
                "mod_value": 21,
                "mod_compare_13": 13,
                "mod_compare_12": 12
                
              }
            }
          }
        }
      ]
    }
  }
}

# Result

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "discuss-test",
        "_id": "gu2qb4YBNTELl2ka57Sw",
        "_score": 1,
        "_source": {
          "name": "test1",
          "hash": "9048716794795431"
        },
        "fields": {
          "name": [
            "test1"
          ],
          "modulus": [
            12
          ],
          "hash": [
            9048716794795432
          ]
        }
      },
      {
        "_index": "discuss-test",
        "_id": "hO2qb4YBNTELl2ka57TO",
        "_score": 1,
        "_source": {
          "name": "test3",
          "hash": "9584341227278131"
        },
        "fields": {
          "name": [
            "test3"
          ],
          "modulus": [
            13
          ],
          "hash": [
            9584341227278132
          ]
        }
      }
    ]
  }
}

does this work with elastic 7.x ?

Everything but the new fields API will work so here is the 7.x version that works
It does not handle where the field does not exist you would need to add a exists clause in your query which I did below...

So give it a try and report back

GET discuss-test/_search
{
  "fields": [
    "*"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "hash"
          }
        },
        {
          "script": {
            "script": {
              "source": "long modulus = doc['hash'].value % params.mod_value; return (modulus == params.mod_compare_13 || modulus == params.mod_compare_12);",
              "params": {
                "mod_value": 21,
                "mod_compare_13": 13,
                "mod_compare_12": 12
              },
              "lang": "painless"
            }
          }
        }
      ]
    }
  }
}

# Results

{
  "took" : 18,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "discuss-test",
        "_type" : "_doc",
        "_id" : "loSndIYBR0tNM9ukwrWK",
        "_score" : 1.0,
        "_source" : {
          "name" : "test1",
          "hash" : "9048716794795431"
        },
        "fields" : {
          "name" : [
            "test1"
          ],
          "hash" : [
            9048716794795431
          ],
          "modulus" : [
            12
          ]
        }
      },
      {
        "_index" : "discuss-test",
        "_type" : "_doc",
        "_id" : "mISndIYBR0tNM9ukwrWu",
        "_score" : 1.0,
        "_source" : {
          "name" : "test3",
          "hash" : "9584341227278131"
        },
        "fields" : {
          "name" : [
            "test3"
          ],
          "hash" : [
            9584341227278131
          ],
          "modulus" : [
            13
          ]
        }
      }
    ]
  }
}

@stephenb - thank you for this fix. It did work this time. I appreciate all the efforts you put in, in order to help me get past this issue. Thank you once again.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.