I've spent a number of hours trying to get a simple Regexp query to work.
I'm using Elasticsearch 1.0 with the defaults. Here's the data I've posted
to ES:
$ curl -XPOST 'elasticsearch:9200/regex_test/useragent' -d '
{
"@message": ""userAgent": "Mozilla/5.0 (compatible; MSIE 9.0; Windows
NT 6.1; Trident/5.0)""
}'
Note the escaped double-quotes. Now I'm trying to match this document with
the following regexp filter:
I get 0 hits. I thought it would have matched the sequence "Mozilla/5..."
?? I also tried ".Mozilla." which doesn't work either. However, when I
match against a blank regexp wildcard I do get the result back (showing
that Regexp is working):
I tried playing around with dynamic mapping templates and using the keyword
analyzer and no analyzer but that didn't seem to make a difference. How can
I go about optimizing the @message field across all my indexes for regexp
searches?
Assuming you have no prior mappings, your first example will put @message
through a standard analyzer - i.e. it will chop it up into pieces using
this analyzer:
So a query like this will not match (since the standard analyzer will make
it into multiple terms like: ["useragent", "mozilla", "5.0"], etc.):
"regexp": {
"@message": "Mozilla.5.*"
}
But something like this will (since it matches one of the terms: "mozilla"):
"regexp": {
"@message": "mozill."
}
If instead you use something like a keyword analyzer (or not_analyzed),
then the whole string is a single token ([""userAgent": "Mozilla/5.0
(compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)""]).
In this case a query like this will still not match:
Ahh ok. I'll have to give the keyword analyzer a try then!
Thanks,
Jamil
On Friday, February 21, 2014 2:23:06 PM UTC-8, Binh Ly wrote:
Assuming you have no prior mappings, your first example will put @message
through a standard analyzer - i.e. it will chop it up into pieces using
this analyzer:
So a query like this will not match (since the standard analyzer will make
it into multiple terms like: ["useragent", "mozilla", "5.0"], etc.):
"regexp": {
"@message": "Mozilla.5.*"
}
But something like this will (since it matches one of the terms:
"mozilla"):
"regexp": {
"@message": "mozill."
}
If instead you use something like a keyword analyzer (or not_analyzed),
then the whole string is a single token ([""userAgent": "Mozilla/5.0
(compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)""]).
In this case a query like this will still not match:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.