Hello!
I have constructed a elastic search query with a filter and in the filter context and I am writing a painless script to filter some documents based on the body of the text field. However, when I want to access the text field, I get a list of terms instead of the original text. I am looking for a way to access the original text body in the painless script instead of a list of terms. Alternatively, I would like to access the term frequency vector of the document in this context if access to the body of the text is not possible.
For instance if I run this query:
```
GET twitter/_search
{
"query": {
"bool": {
"must":{
"term" : { "body" : "spark" }
},
"filter": [
{
"script" : {
"script" : {
"lang": "painless",
"source": """
String text = doc['body'].toString();
Debug.explain(text);
return true;
"""
}
}
}
]
}
}
}
```
I get this response :
```
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 4,
"skipped" : 0,
"failed" : 1,
"failures" : [
{
"shard" : 2,
"index" : "twitter",
"node" : "AClIunrSRUKb1gbhBz-JoQ",
"reason" : {
"type" : "script_exception",
"reason" : "runtime error",
"painless_class" : "java.lang.String",
"to_string" : "[and, by, cutting, doug, hadoop, jack, jim, lucene, made, spark, the, was]",
"java_class" : "java.lang.String",
"script_stack" : [
"Debug.explain(text);\n ",
" ^---- HERE"
],
"script" : """
String text = doc['body'].toString();
Debug.explain(text);
return true;
""",
"lang" : "painless",
"caused_by" : {
"type" : "painless_explain_error",
"reason" : null
}
}
}
]
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
```
As you can see the debug shows that the doc['body'].toString()
is in fact a list of terms [and, by, cutting, doug, hadoop, jack, jim, lucene, made, spark, the, was]
. What I would like to have is to access to the original text which in this example is "body" : "The Lucene was made by Doug Cutting and the hadoop was made by Jim and Spark was made by jack"
NOTE: I have set the "fielddata": true
and "store":true
on this field and also indexed the document in a body.exact
field so that terms wont get analyzed but nevertheless my problem is that I can't access the original text in the script in the filter context and I always get the list of unique terms.
Many thanks for your help!