Uppcase letters in request cause no results to be found?

Elasticsearch v5 Alpha2

I'm noticing something bizarre while trying to search our cluster via API. It seems as if I include an uppercase character in my wildcard or regexp string, no results come back.. For example:

This works:

{
    "query" : {
        "bool" : {
            "must" : {
                "wildcard" : {
                    "message" : "*dapter*"
                }
            },
            "filter" : {
                "range" : {"@timestamp" : {"gte" : "now-15m"}}
            }
        }
    }
}

This query yields results:

"hits": {
    "total": 58,
    "max_score": 1,

And I can see one of the results contains the message string:

<log realm=\"org.jpos.security.hsm.thales.ThalesAdapter\" at=\"Thu May 26 00:14:45.962 GMT+00:00 2016\" lifespan=\"23ms\">\n  <trace>\n    <elapsed-time>22ms</elapsed-time>\n  </trace>\n</log>

However if I search for *Adapter* instead of *dapter in my query:

{
    "query" : {
        "bool" : {
            "must" : {
                "wildcard" : {
                    "message" : "*Adapter*"
                }
            },
            "filter" : {
                "range" : {"@timestamp" : {"gte" : "now-15m"}}
            }
        }
    }
}

I get nothing:

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 9,
    "successful": 9,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

What am I missing?

Wildcard queries are not analyzed.

https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-wildcard-query.html

BTW don't use wildcards specifically at the beginning.

Will be super slow

I've noticed the same behavior with regexp queries.. Why would it match some strings but not others in the same field?

"regexp" : {
                    "message" : ".*dapter.*"
                }

FInds results,

"regexp" : {
                    "message" : ".*Adapter.*"
                }

Finds none.

It's the way analysis is working. Adapter has been indexed as adapter by default (unless you change analyzer settings).

If you compare Adapter and adapter they won't match.

That's what is happening here behind the scene. The regex or the wildcard is applied as is.

Try with ".*adapter.*" for example.

You can also read this https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-query-string-query.html#_wildcards

Using the query string clause works much better for me..

{
    "query" : {
        "bool" : {
            "must" : {
                "query_string" : {
                    "default_field" : "message",
                    "query" : "host:*myserver* message:\"ERROR\""
                }
            },
            "filter" : {
                "range" : {"@timestamp" : {"gte" : "now-15m"}}
            }
        }
    }
}

Thanks.

Note that by default this will gives you data where host contains server OR message is ERROR.

And again, host:*myserver* could be terribly slow.

What's the best way to filter by partial host string?

Generate subtokens at index time using ngram strategy. But this will take more disk space.

Read: https://www.elastic.co/guide/en/elasticsearch/reference/2.3/analysis-ngram-tokenfilter.html

Hi @Mike_Wurtz can You got Proper solution for this issue.
I facing same issue.