Strange query result: "from=<>" matches "from=<foobar>"

Hi all,

This is probably an easy one. My search query is this

$ curl -XGET http://localhost:9200/logs/_search?pretty -d '{
"query": {
"query_string": {
"default_operator": "OR",
"query": "message:"from=<>"",
"default_field": "_all"
}
}
}'

The above query matches logs documents that have "from=" in the
message field. I am not sure why!
Full output of the query can be seen here: http://sprunge.us/PLJC

Basically, I want to search for messages that have "from=<>" in the
"@message" field in the logs I'm indexing using logstash. I'm using
Kibana to search text but I don't think that all this is relevant here.

So, what's really going on here. I also tried looking in lucene query
syntax and I don't find "<>" having a special meaning.

Regards,
shadyabhi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The extra characters are parsed out by the analyzer. You can see the
behavior yourself by using the analysis API:
http://localhost:9200/_analyze?text="from%3D<>"

Switching to a term query will preserve the text, but the match would need
to be exact. Try playing around with the analysis or the type of query
(prefix, term).

--
Ivan

On Mon, Mar 11, 2013 at 5:16 AM, Abhijeet abhijeet.1989@gmail.com wrote:

Hi all,

This is probably an easy one. My search query is this

$ curl -XGET http://localhost:9200/logs/_**search?prettyhttp://localhost:9200/logs/_search?pretty-d '{
"query": {
"query_string": {
"default_operator": "OR",
"query": "message:"from=<>"",
"default_field": "_all"
}
}
}'

The above query matches logs documents that have "from=" in the
message field. I am not sure why!
Full output of the query can be seen here: http://sprunge.us/PLJC

Basically, I want to search for messages that have "from=<>" in the
"@message" field in the logs I'm indexing using logstash. I'm using Kibana
to search text but I don't think that all this is relevant here.

So, what's really going on here. I also tried looking in lucene query
syntax and I don't find "<>" having a special meaning.

Regards,
shadyabhi

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Mon, Mar 11, 2013 at 9:49 PM, Ivan Brusic ivan@brusic.com wrote:

The extra characters are parsed out by the analyzer. You can see the
behavior yourself by using the analysis API:
http://localhost:9200/_analyze?text="from%3D<>"

Thanks for pointing me to the right direction Ivan. I read about Analyzers
and got a hang of what you meant. I am actually storing logs in ES so I
plan to set analyzer to whitespace. I think that'll eliminate the issue.
(Still need to test it)

Switching to a term query will preserve the text, but the match would need
to be exact. Try playing around with the analysis or the type of query
(prefix, term).

Can you elaborate on this? What do you mean by "match would need to be
exact" ? Can you explain a bit about term and prefix queries? I am a newbie
in ES so pls bear with me.

--
Ivan

On Mon, Mar 11, 2013 at 5:16 AM, Abhijeet abhijeet.1989@gmail.com wrote:

Hi all,

This is probably an easy one. My search query is this

$ curl -XGET http://localhost:9200/logs/_**search?prettyhttp://localhost:9200/logs/_search?pretty-d '{
"query": {
"query_string": {
"default_operator": "OR",
"query": "message:"from=<>"",
"default_field": "_all"
}
}
}'

The above query matches logs documents that have "from=" in the
message field. I am not sure why!
Full output of the query can be seen here: http://sprunge.us/PLJC

Basically, I want to search for messages that have "from=<>" in the
"@message" field in the logs I'm indexing using logstash. I'm using Kibana
to search text but I don't think that all this is relevant here.

So, what's really going on here. I also tried looking in lucene query
syntax and I don't find "<>" having a special meaning.

Regards,
shadyabhi

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

A term query does not analyze the text and it only does exact matches. A
term query for the term "from" will not match "from=". Without
knowing too much of your content, a whitespace analyzer should work.

A whitespace analyzer is similar to a keyword analyzer if there are no
whitespaces in your content.

--
Ivan

On Tue, Mar 12, 2013 at 2:24 AM, Abhijeet Rastogi
abhijeet.1989@gmail.comwrote:

On Mon, Mar 11, 2013 at 9:49 PM, Ivan Brusic ivan@brusic.com wrote:

The extra characters are parsed out by the analyzer. You can see the
behavior yourself by using the analysis API:
http://localhost:9200/_analyze?text="from%3D<>"

Thanks for pointing me to the right direction Ivan. I read about Analyzers
and got a hang of what you meant. I am actually storing logs in ES so I
plan to set analyzer to whitespace. I think that'll eliminate the issue.
(Still need to test it)

Switching to a term query will preserve the text, but the match would
need to be exact. Try playing around with the analysis or the type of query
(prefix, term).

Can you elaborate on this? What do you mean by "match would need to be
exact" ? Can you explain a bit about term and prefix queries? I am a newbie
in ES so pls bear with me.

--
Ivan

On Mon, Mar 11, 2013 at 5:16 AM, Abhijeet abhijeet.1989@gmail.comwrote:

Hi all,

This is probably an easy one. My search query is this

$ curl -XGET http://localhost:9200/logs/_**search?prettyhttp://localhost:9200/logs/_search?pretty-d '{
"query": {
"query_string": {
"default_operator": "OR",
"query": "message:"from=<>"",
"default_field": "_all"
}
}
}'

The above query matches logs documents that have "from=" in the
message field. I am not sure why!
Full output of the query can be seen here: http://sprunge.us/PLJC

Basically, I want to search for messages that have "from=<>" in the
"@message" field in the logs I'm indexing using logstash. I'm using Kibana
to search text but I don't think that all this is relevant here.

So, what's really going on here. I also tried looking in lucene query
syntax and I don't find "<>" having a special meaning.

Regards,
shadyabhi

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Regards,
Abhijeet Rastogi (shadyabhi)
https://plus.google.com/107316377741966576356/

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The problem with Whitespace analyzer is, if I set that and have a log line
like:

"Message contains from=<>" and
"Message contains from=myemail@com

If I search for myemail@com, I will not get the second line in search
results. I'm using Kibana to search my logs in ES.

On Wed, Mar 13, 2013 at 3:09 AM, Ivan Brusic ivan@brusic.com wrote:

A term query does not analyze the text and it only does exact matches. A
term query for the term "from" will not match "from=". Without
knowing too much of your content, a whitespace analyzer should work.

A whitespace analyzer is similar to a keyword analyzer if there are no
whitespaces in your content.

--
Ivan

--
Regards,
Abhijeet Rastogi (shadyabhi)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.