Searching on keywords that have spaces fail using simple_query_string

I have a tags keyword field defined simply as:

'tags' => [
               'type' => 'keyword'
               ],

I have a tag field that stores:

"tags" : [
      "523",
      "523 az"
    ],

I have a simple search setup as:

 $params['body']['query']['bool']['must']['simple_query_string'] = [
          'query' => $string,
          'fields' => ['tags']
        ];

Searching for 523 works like a charm.
Searching for 523 az fails.

Anything with a space in the keyword field fails.
Is there a way to get this to work with simple_query_string?

1 Like

It's works in kibana, you can try it

PUT index1
{
  "mappings": {
    "properties": {
      "tags": {
        "type": "keyword"
      }
    }
  }
}
POST index1/_doc
{
  "tags": [
    "523",
    "523 az"
  ]
}
POST index1/_search
{
  "query": {
    "simple_query_string": {
      "query": "523",
      "fields": [
        "tags"
      ]
    }
  }
}
POST index1/_search
{
  "query": {
    "simple_query_string": {
      "query": "523 az",
      "fields": [
        "tags"
      ]
    }
  }
}

I'm finding the same thing - I can't target keyword fields that have spaces in using simple_query_string (es version 7.8.0)

My issue here: Discrepancy (bug or misunderstanding) in how simple_query_string handles queries?

I tried this without the replicated tag, '523', ie:

PUT index1
{
  "mappings": {
    "properties": {
      "tags": {
        "type": "keyword"
      }
    }
  }
}

POST index1/_doc
{
  "tags": [
    "L120   2007:09:11 98C57U495891"
  ]
}
POST index1/_search
{
  "query": {
    "simple_query_string": {
      "query": "L120   2007:09:11 98C57U495891",
      "fields": [
        "tags"
      ]
    }
  }
}

With 0 results:

{
  "took" : 972,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

I also tried this query:

POST index1/_search
{
  "query": {
    "simple_query_string": {
      "query": "L120",
      "fields": [
        "tags"
      ]
    }
  }
}

and also got 0 results.

@JJK It might be worth mentioning that multi_match is your friend here, but only up to a point. For example:

POST index1/_search
{
  "query": {
    "multi_match": {
      "query": "L120   2007:09:11 98C57U495891",
      "fields": [
        "tags^1.0"
      ],
      "operator": "and"
    }
  }
}

works fine, but you can't do partial value or wildcard searches, for example:

POST index1/_search
{
  "query": {
    "multi_match": {
      "query": "98C57U495891",
      "fields": [
        "tags^1.0"
      ],
      "operator": "and"
    }
  }
}

returns 0 results and so does:

POST index1/_search
{
  "query": {
    "multi_match": {
      "query": "*98C57U495891",
      "fields": [
        "tags^1.0"
      ],
      "operator": "and"
    }
  }
}

Further to this, this works:

POST index1/_search
{
  "query": {
      "query_string": {
        "query": "(L120) and (2007\\:09\\:11) and (*98C57U495891)",
        "fields": [
          "tags^1.0"
        ]
    }
  }
}

yet this does not:

POST index1/_search
{
  "query": {
      "query_string": {
        "query": "(L120) and (2007\\:09\\:11) and (98C57U495891)",
        "fields": [
          "tags^1.0"
        ]
    }
  }
}

I am confounded.

Multimatch is not something I can use for my use case.

I have found that escaping the space will bring back the correct results.
I'm still looking for why that might be. What setting is causing the required escape on a space?
Can't find anything yet.

1 Like

Interesting...is it keyword (term level) vs text (match) again? If you don't escape the space, is it attempting to tokenize the query and compare constituent parts against the fields, and because they're terms (and it's not doing a wildcard) then it doesn't match?

Even using a wildcard won't return it, unless I'm escaping the space.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.