Help on script score query with filter script

Hi elastic community,

I need some guidance on the following query. I am using the Python Elasticsearch client and when executing the query it returns a 'search_phase_execution_exception' with no further guidance. Can you spot the error in the body?

{
            "size": size,
            "query": {
                "script_score": {
                    "query": {
                        "bool": {
                            "filter": {
                                "script": {
                                    "script": {
                                        "source": "doc['sentence_text'].size() > params.min_length",
                                        "params": {
                                            "min_length": min_length
                                        }
                                    }
                                }
                            }
                        }
                    },
                    "script": {
                        "source": "cosineSimilarity(params.queryVector,'sentence_vector') + 1.0",
                        "params": {
                            "queryVector": sentence_embedding
                        }
                    }
                }
            }
        }

I managed query Elasticsearch using curl and got some more info.

{
  "error" : {
    "root_cause" : [
      {
        "type" : "script_exception",
        "reason" : "runtime error",
        "script_stack" : [
          "org.elasticsearch.index.mapper.TextFieldMapper$TextFieldType.fielddataBuilder(TextFieldMapper.java:814)",
          "org.elasticsearch.index.fielddata.IndexFieldDataService.getForField(IndexFieldDataService.java:109)",
          "org.elasticsearch.index.query.SearchExecutionContext.lambda$lookup$2(SearchExecutionContext.java:503)",
          "org.elasticsearch.search.lookup.SearchLookup.getForField(SearchLookup.java:105)",
          "org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:72)",
          "org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:69)",
          "java.base/java.security.AccessController.doPrivileged(AccessController.java:318)",
          "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:69)",
          "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:27)",
          "doc['sentence_text'].length() > params.min_length",
          "    ^---- HERE"
        ],
        "script" : "doc['sentence_text'].length() > params.min_length",
        "lang" : "painless",
        "position" : {
          "offset" : 4,
          "start" : 0,
          "end" : 49
        }
      }
    ],
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : 0,
        "index" : "conc",
        "node" : "UGH_0oVATf6s32PygGpEgg",
        "reason" : {
          "type" : "script_exception",
          "reason" : "runtime error",
          "script_stack" : [
            "org.elasticsearch.index.mapper.TextFieldMapper$TextFieldType.fielddataBuilder(TextFieldMapper.java:814)",
            "org.elasticsearch.index.fielddata.IndexFieldDataService.getForField(IndexFieldDataService.java:109)",
            "org.elasticsearch.index.query.SearchExecutionContext.lambda$lookup$2(SearchExecutionContext.java:503)",
            "org.elasticsearch.search.lookup.SearchLookup.getForField(SearchLookup.java:105)",
            "org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:72)",
            "org.elasticsearch.search.lookup.LeafDocLookup$1.run(LeafDocLookup.java:69)",
            "java.base/java.security.AccessController.doPrivileged(AccessController.java:318)",
            "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:69)",
            "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:27)",
            "doc['sentence_text'].length() > params.min_length",
            "    ^---- HERE"
          ],
          "script" : "doc['sentence_text'].length() > params.min_length",
          "lang" : "painless",
          "position" : {
            "offset" : 4,
            "start" : 0,
            "end" : 49
          },
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [sentence_text] in order to load field data by uninverting the inverted index. Note that this can use significant memory."
          }
        }
      }
    ]
  },
  "status" : 400
}

In filter context, _source is not accessible from params, which is not explicitly explaind in documents.

I do not access _source in params.

Sorry I jumped the logic, text field doesn't indexed to doc_values, therefore it is not accessible via doc.field accessor. And also it can't be accessed from params._source.field.

1 Like

Np, thank you @Tomo_M. I solved my issue by enabling on the field 'sentence_text', which is a text field, fielddata as described here.

1 Like

Yes, turn_on fielddata is one of the solution. Note that it contains analyzed tokens and the length is something different from the exact length of the sentence.

If turning on fielddata cause some problem, consider use ingest pipeline to add a field representing length of a text field.

1 Like

Nice, you already answered the next question.

It is a research project for university, so efficiency is not an issue.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.