Hi all,
I have a problem regarding how the query string is tokenized when performing a query_string
search. Could it be possible to opt for the whitespace tokenizer instead of standard because I'm searching for an exact phrase that contains hyphens '-' (like GUID) which gets split up in parts and then searched for. What I'm ended up with is instead of one document result (with exact GUID) I get multi results with all the records whose field contains any of those parts.
To put the whole picture, I have a watcher that searches for output transactions with large amounts ('outputs' part of the chained search) and if any (transform is done in the 'orders_lookup' part of the chained search), searches for corresponding inputs ('inputs' part of the chained search). Here is the part of that watcher:
"input": {
"chain": {
"inputs": [
{
"outputs": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"store"
],
"rest_total_hits_as_int": true,
"body": {
"query": {
"bool": {
"filter": [
{
"term": {
"status": 0
}
},
{
"term": {
"transactionType": 4
}
},
{
"range": {
"amount": {
"gte": "{{ctx.metadata.threshold}}"
}
}
},
{
"range": {
"eventTime": {
"gte": "now-{{ctx.metadata.window_period_outputs}}m"
}
}
}
]
}
}
}
}
}
}
},
{
"orders_lookup": {
"transform": {
"script": {
"source": """HashSet orders = new HashSet();
for (output in ctx.payload.outputs.hits.hits) orders.add(output._source.OrderId);
return ['ordersA' : orders];""",
"lang": "painless"
}
}
}
},
{
"inputs": {
"search": {
"request": {
"search_type": "query_then_fetch",
"indices": [
"store"
],
"rest_total_hits_as_int": true,
"body": {
"query": {
"bool": {
"filter": [
{
"term": {
"status": 0
}
},
{
"term": {
"transactionType": 3
}
},
{
"range": {
"eventTime": {
"gte": "now-{{ctx.metadata.window_period_inputs}}m"
}
}
},
{
"query_string": {
"default_field": "OrderId.txt",
"query": "{{#ctx.payload.orders_lookup.ordersA}}'{{.}}' {{/ctx.payload.orders_lookup.ordersA}}"
}
}
]
}
}
}
}
}
}
}
]
}
}
Given that there could be more than one large transaction, I'm using the query_string
for searching the inputs where order ids are queried.
Field mapping is:
"OrderId": { "type": "keyword", "fields": { "txt": { "type": "text" } } }
And here is an example:
{ "query_string": { "default_field": "OrderId.txt", "query": "d6220c50-9ec1-ea11-9b05-501ac5532e5e" } }
Will return any document which OrderId field contains any of the tokens 'd6220c50', '9ec1', 'ea11', '9b05' or '501ac5532e5e'.
If I add the analyzer
:
{ "query_string": { "default_field": "OrderId.txt", "query": "d6220c50-9ec1-ea11-9b05-501ac5532e5e", "analyzer": "whitespace" } }
I get 0 hits ?
Thanks in advance