Wildcard query not working as expected

I'm trying to search Apache access logs using a wildcard on the useragent.name field (as added by the useragent plugin)

I'm searching on the Discover tab in Kibana 4.6.1

I can search for useragent.name:"Chrome Mobile" and get lots of hits.

I can also search for useragent.name:"Chrome" and get lots of hits.

However, if I search for any of these, I get 0 hits:

useragent.name:Chrom*
useragent.name:Chrome\ *
useragent.name:"Chrom*"
useragent.name:"Chrome *"
useragent.name:"Chrome Mobil*"

Excerpt from the mapping:

...
"useragent": {
  "properties": {
    ...
    "name": {
      "type": "string",
      "index": "not_analyzed"
    },
    ...

Can anyone explain why these don't work? All the docs relating to wildcards suggest these should work.

More info...

Wildcard searches work fine in other fields such as referrer:

request:*.jpg*
request:*bot.htm*

which has this mapping:

...
"referrer": {
  "type": "string",
  "index": "not_analyzed"
},
...

This works for me locally as expected. I wonder if it's a case matching issue.

What are the exact values you get back for useragent.name when you search for useragent.name:"Chrome"?

I recommend you this great article:

I quote at some point:

Attention: you cannot use wildcards inside of phrases. If you search for author:"Do?glas Adams" the questionmark won't be used as a wildcard, but must be part of the indexed value (which it isn't in our case). Even more attention: since Elasticsearch applies the analyzers on your query, it might look like wildcards are working inside phrases if you place them at the beginning/end of words — e.g. author:"Douglas Adams*" will still return both documents on analyzed data, but not because the wildcard worked as expected, just because the analyzer stripped that asterisk when analyzing the query. That query wouldn't find the value "Douglas Adamsxxx".

After now showing what doesn't work (wildcards in phrases), let's look a bit on how they DO work. Let's say we want to search for all books by authors with "doug" in the beginning of their name. If we search for author:doug* on analyzed data we will get both documents. In contrast searching for author:doug wouldn't return anything, since there is no entry in the inverted index for "doug". When entering that query, Elasticsearch will look in the inverted index and search for an entry that matches "doug*" (with the asterisk being an arbitrary amount of characters). There is an entry in the inverted index (namely "douglas"), which links to both documents so both documents will be returned.

So try removing quotes from your Chrom* expression

And for regex:

Elasticsearch also supports searching for regular expressions by wrapping the search string in forward slashes, e.g. author:/[Dd]ouglas.*/. Like the other queries this regex will be searched for in the inverted index, i.e. the regex must match to an entry in the inverted index and not the actual field value.

But I strongly recommend you to read the whole thing, I found it extremely interesting and I've learned a bunch of things about elasticsearch and queries in kibana

5 Likes

That's a really fantastic article, thanks! I now understand the issue and I've learned a lot more as well.

To summarise here for anyone else, these are the key points relating to this issue...

  • You can't use wildcards inside a phrase (ie. inside quotes)
  • Whenever you use wildcards, your query is converted to lowercase
  • Searching not_analyzed fields is always case-sensitive

Therefore if you search a not_analyzed field and use wildcards, you MUST NOT include any capital letters in your query, and you MUST NOT wrap it in quotes.

If you need a space in the query, escape it with \

These searches DO work correctly:

useragent.name:*hrom*
useragent.name:?hrom*
useragent.name:*hrome*
useragent.name:?hrome*
useragent.name:?hrome\ *
useragent.name:*?obile*

So the main takeaway is...

If you're searching a not_analyzed field and using wildcards, replace all capital letters in the query with question marks.

Regex doesn't seem to work as describe in that article.

The advice is to avoid regex if possible due to it being expensive, however I'd still like to understand it.

From the article:

For example if we search for author:/[Dd]ouglas.*[Aa]dams/ in the unanalyzed data, it will yield the two documents, since there was an entry for "Douglas Adams" in the inverted index.

So these should work but they all return zero hits:

useragent.name:/Chrome/
useragent.name:/[Cc]hrome/
useragent.name:/[Cc]hrome.*/

These do work:

useragent.name:/.hrome/
useragent.name:/.hrome.*/

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.