Setting the whitespace analyzer for query_string search

Hi all,
I have a problem regarding how the query string is tokenized when performing a query_string search. Could it be possible to opt for the whitespace tokenizer instead of standard because I'm searching for an exact phrase that contains hyphens '-' (like GUID) which gets split up in parts and then searched for. What I'm ended up with is instead of one document result (with exact GUID) I get multi results with all the records whose field contains any of those parts.
To put the whole picture, I have a watcher that searches for output transactions with large amounts ('outputs' part of the chained search) and if any (transform is done in the 'orders_lookup' part of the chained search), searches for corresponding inputs ('inputs' part of the chained search). Here is the part of that watcher:

  "input": {
    "chain": {
      "inputs": [
        {
          "outputs": {
            "search": {
              "request": {
                "search_type": "query_then_fetch",
                "indices": [
                  "store"
                ],
                "rest_total_hits_as_int": true,
                "body": {
                  "query": {
                    "bool": {
                      "filter": [
                        {
                          "term": {
                            "status": 0
                          }
                        },
                        {
                          "term": {
                            "transactionType": 4
                          }
                        },
                        {
                          "range": {
                            "amount": {
                              "gte": "{{ctx.metadata.threshold}}"
                            }
                          }
                        },
                        {
                          "range": {
                            "eventTime": {
                              "gte": "now-{{ctx.metadata.window_period_outputs}}m"
                            }
                          }
                        }
                      ]
                    }
                  }
                }
              }
            }
          }
        },
        {
          "orders_lookup": {
            "transform": {
              "script": {
                "source": """HashSet orders = new HashSet();
                  for (output in ctx.payload.outputs.hits.hits) orders.add(output._source.OrderId);
                  return ['ordersA' : orders];""",
                "lang": "painless"
              }
            }
          }
        },
        {
          "inputs": {
            "search": {
              "request": {
                "search_type": "query_then_fetch",
                "indices": [
                  "store"
                ],
                "rest_total_hits_as_int": true,
                "body": {
                  "query": {
                    "bool": {
                      "filter": [
                        {
                          "term": {
                            "status": 0
                          }
                        },
                        {
                          "term": {
                            "transactionType": 3
                          }
                        },
                        {
                          "range": {
                            "eventTime": {
                              "gte": "now-{{ctx.metadata.window_period_inputs}}m"
                            }
                          }
                        },
                        {
                          "query_string": {
                            "default_field": "OrderId.txt",
                            "query": "{{#ctx.payload.orders_lookup.ordersA}}'{{.}}' {{/ctx.payload.orders_lookup.ordersA}}"
                          }
                        }
                      ]
                    }
                  }
                }
              }
            }
          }
        }
      ]
    }
  }

Given that there could be more than one large transaction, I'm using the query_string for searching the inputs where order ids are queried.
Field mapping is:

"OrderId": {
  "type": "keyword",
  "fields": {
	"txt": {
	  "type": "text"
	}
  }
}

And here is an example:

{
  "query_string": {
	"default_field": "OrderId.txt",
	"query": "d6220c50-9ec1-ea11-9b05-501ac5532e5e"
  }
}

Will return any document which OrderId field contains any of the tokens 'd6220c50', '9ec1', 'ea11', '9b05' or '501ac5532e5e'.
If I add the analyzer:

{
  "query_string": {
	"default_field": "OrderId.txt",
	"query": "d6220c50-9ec1-ea11-9b05-501ac5532e5e",
	"analyzer": "whitespace"
  }
}

I get 0 hits ?

Thanks in advance

Why not use term query on OrderId field?

If OrderId is always a random guid I will not define txt field at all.

First let me thank you for your replay.
In my watcher there could be more than one output transaction that matches the criteria. I than extract the order ids into a hash set. Next I'm searching for input transactions for those orders.
My setup is that I have two transactions for each order, input and output (there should be one input transaction for each output transactions). Order ids are out of my control, some of them are GUIDs and some are strings or even numbers (they are all unique).

Hi Josip,
You are welcome.

My point is irrespective of what value OrderId contains it's never searched by partial string (token). So there is no point analyzing it. For ex. would you ever need to find order "d6220c50-9ec1-ea11-9b05-501ac5532e5e" by searching "9ec1" or find all orders that contain "9ec1"?

I am not much familiar with watcher. I am only suggesting instead of query_string query you can use terms query
like

{
  "terms": {
	"OrderId" :  [ "d6220c50-9ec1-ea11-9b05-501ac5532e5e", "2nd OrderId", "3rd OrderId"... ]
  }
}

Once again thanks for your participation in this. Now that you mentioned I remember why I needed to use query_string. It's because of how I'm constructing the query (a mustache loop):

"query": "{{#ctx.payload.orders_lookup.ordersA}}'{{.}}' {{/ctx.payload.orders_lookup.ordersA}}"

I wanted to use the terms query but there was an issue creating the input for the terms query (string is wrapped in triple quotes due to Kibana’s handling of strings, and the array isn’t parsed correctly).
So I end up updated mapping for OrderId field to create a sub-field of type ’text’ (initially it was only a keyword field) to use with the query string.

@Josip_Cagalj

Avoiding txt field will reduce your index size, speed up ingestion.

I don't have Xpack license to try it out. But if you post a new question asking how to pass /assign array of strings in watcher, I am sure someone will help you.

If you haven't already can you try this?

  1. in the transform script
    return ['ordersA' : orders.toArray(new String[orders.size()]) ];

  2. replace query_string query with

{
  "terms": {
	"OrderId" :  "{{ctx.payload.orders_lookup.ordersA}}"
  }
}

I've tried but I'm getting error:

        "type" : "script_exception",
        "reason" : "compile error",
        "script_stack" : [
          "... ordersA' : orders.toArray(new String[orders.size()] ...",
          "                              ^---- HERE"
        ],
		
		...
		
	  "caused_by" : {
      "type" : "class_cast_exception",
      "reason" : "Cannot cast from [java.lang.String[]] to [def[]]."

@Josip_Cagalj
Can you try this?

  1. in the transform script
    return ['ordersA' : orders.join("\",\"")];

  2. replace query_string query with

{
  "terms": {
	"OrderId" :  ["{{{ctx.payload.orders_lookup.ordersA}}}"]
  }
}

Note 3 curly braces on OrderId

Thank you very much !!! It works now with the solution you provided!
Now I can get rid of the multifield I've introducet into my mappings because of the query_string search.
You are the man, thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.