Query_string query that contains < and > symbols


(Tihomir Lichev) #1

Hello,
I recently discovered very interesting problem

The analyzer is whitespace:
{
"mappings": {
"test": {
"properties": {
"title": {
"analyzer": "whitespace",
"type": "string"
},
"description": {
"analyzer": "whitespace",
"type": "string"
}
}
}
}
}

I have 3 documents with the following content:
{
"id": 1,
"title" : "testing123",
"description" : "a description"
}
{
"id": 2,
"title" : "testing >end",
"description" : "another description"
}
{
"id": 3,
"title" : "testing <end",
"description" : "another description"
}

Then I did the following search queries:
{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": ">end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 3 hits (all docs) - wrong
EXPLAIN description: "ConstantScore(title:{end TO *]), product of:"

{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": "<end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 3 hits (all docs) - wrong
EXPLAIN description: "ConstantScore(title:[* TO end}), product of:"

{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": "end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 0 hits (correct!)

I tried to escape "<" and ">" in the query, but no success

Search for < and > in the elasticsearch and lucene docs - also no luck.
Obvoiusly they are recognized as range operators, but I want to use them
also as normal text symbols.
Is there any way to escape them properly ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6388d7d4-d14a-4e6e-b09b-67860ccbb574%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Tihomir Lichev) #2

Does anyone have an idea hot to escape < and > in a query ???

07 август 2014, четвъртък, 18:27:08 UTC+3, Tihomir Lichev написа:

Hello,
I recently discovered very interesting problem

The analyzer is whitespace:
{
"mappings": {
"test": {
"properties": {
"title": {
"analyzer": "whitespace",
"type": "string"
},
"description": {
"analyzer": "whitespace",
"type": "string"
}
}
}
}
}

I have 3 documents with the following content:
{
"id": 1,
"title" : "testing123",
"description" : "a description"
}
{
"id": 2,
"title" : "testing >end",
"description" : "another description"
}
{
"id": 3,
"title" : "testing <end",
"description" : "another description"
}

Then I did the following search queries:
{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": ">end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 3 hits (all docs) - wrong
EXPLAIN description: "ConstantScore(title:{end TO *]), product of:"

{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": "<end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 3 hits (all docs) - wrong
EXPLAIN description: "ConstantScore(title:[* TO end}), product of:"

{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": "end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 0 hits (correct!)

I tried to escape "<" and ">" in the query, but no success

Search for < and > in the elasticsearch and lucene docs - also no luck.
Obvoiusly they are recognized as range operators, but I want to use them
also as normal text symbols.
Is there any way to escape them properly ?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9fee6a55-b3ff-4192-9329-4e5f6fead45e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #3

This is not a problem of escape.

Always use "match" query. Never use "query_string".

Jörg

On Tue, Aug 12, 2014 at 10:50 AM, Tihomir Lichev shoteff@gmail.com wrote:

Does anyone have an idea hot to escape < and > in a query ???

07 август 2014, четвъртък, 18:27:08 UTC+3, Tihomir Lichev написа:

Hello,
I recently discovered very interesting problem

The analyzer is whitespace:
{
"mappings": {
"test": {
"properties": {
"title": {
"analyzer": "whitespace",
"type": "string"
},
"description": {
"analyzer": "whitespace",
"type": "string"
}
}
}
}
}

I have 3 documents with the following content:
{
"id": 1,
"title" : "testing123",
"description" : "a description"
}
{
"id": 2,
"title" : "testing >end",
"description" : "another description"
}
{
"id": 3,
"title" : "testing <end",
"description" : "another description"
}

Then I did the following search queries:
{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": ">end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 3 hits (all docs) - wrong
EXPLAIN description: "ConstantScore(title:{end TO *]), product of:"

{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": "<end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 3 hits (all docs) - wrong
EXPLAIN description: "ConstantScore(title:[* TO end}), product of:"

{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": "end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 0 hits (correct!)

I tried to escape "<" and ">" in the query, but no success

Search for < and > in the elasticsearch and lucene docs - also no luck.
Obvoiusly they are recognized as range operators, but I want to use them
also as normal text symbols.
Is there any way to escape them properly ?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9fee6a55-b3ff-4192-9329-4e5f6fead45e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9fee6a55-b3ff-4192-9329-4e5f6fead45e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEfEhK%3D-XWyy8kqRqN6_ESNGa7U1uuowDmm37_0%2BkXo0Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Tihomir Lichev) #4

I think I don't agree ...
I'm using query_string because I want to give the users the ability to use
AND, OR +, - etc., out of the box.
I'm able to escape all other symbols except < and >, and I can use them as
part of the field content, also as part of the query like any regular
letter.
I dont understand why there is no way to escape those 2 symbols only. How
they are different compared to the others ???

Tihomir

12 август 2014, вторник, 11:57:21 UTC+3, Jörg Prante написа:

This is not a problem of escape.

Always use "match" query. Never use "query_string".

Jörg

On Tue, Aug 12, 2014 at 10:50 AM, Tihomir Lichev <sho...@gmail.com
<javascript:>> wrote:

Does anyone have an idea hot to escape < and > in a query ???

07 август 2014, четвъртък, 18:27:08 UTC+3, Tihomir Lichev написа:

Hello,
I recently discovered very interesting problem

The analyzer is whitespace:
{
"mappings": {
"test": {
"properties": {
"title": {
"analyzer": "whitespace",
"type": "string"
},
"description": {
"analyzer": "whitespace",
"type": "string"
}
}
}
}
}

I have 3 documents with the following content:
{
"id": 1,
"title" : "testing123",
"description" : "a description"
}
{
"id": 2,
"title" : "testing >end",
"description" : "another description"
}
{
"id": 3,
"title" : "testing <end",
"description" : "another description"
}

Then I did the following search queries:
{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": ">end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 3 hits (all docs) - wrong
EXPLAIN description: "ConstantScore(title:{end TO *]), product of:"

{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": "<end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 3 hits (all docs) - wrong
EXPLAIN description: "ConstantScore(title:[* TO end}), product of:"

{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": "end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 0 hits (correct!)

I tried to escape "<" and ">" in the query, but no success

Search for < and > in the elasticsearch and lucene docs - also no luck.
Obvoiusly they are recognized as range operators, but I want to use them
also as normal text symbols.
Is there any way to escape them properly ?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9fee6a55-b3ff-4192-9329-4e5f6fead45e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9fee6a55-b3ff-4192-9329-4e5f6fead45e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6bb86ed-111b-4909-a918-e8ae797579da%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #5

Once again, it is not related to "escaping".

Why don't you use simple query_string?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html

It wraps the crappy query_string into "match" query, correctly analyzed and
parsed to Elasticsearch settings, which query_string is not always able to.
And it provides boolean clauses.

Jörg

Examples:

POST /test/_search
{
"query" : {
"simple_query_string" : {
"title" : {
"query" : ">end"
}
}
}
}

POST /test/_search
{
"query" : {
"simple_query_string" : {
"title" : {
"query" : " testing and >end"
}
}
}
}

On Tue, Aug 12, 2014 at 11:01 AM, Tihomir Lichev shoteff@gmail.com wrote:

I think I don't agree ...
I'm using query_string because I want to give the users the ability to use
AND, OR +, - etc., out of the box.
I'm able to escape all other symbols except < and >, and I can use them as
part of the field content, also as part of the query like any regular
letter.
I dont understand why there is no way to escape those 2 symbols only. How
they are different compared to the others ???

Tihomir

12 август 2014, вторник, 11:57:21 UTC+3, Jörg Prante написа:

This is not a problem of escape.

Always use "match" query. Never use "query_string".

Jörg

On Tue, Aug 12, 2014 at 10:50 AM, Tihomir Lichev sho...@gmail.com
wrote:

Does anyone have an idea hot to escape < and > in a query ???

07 август 2014, четвъртък, 18:27:08 UTC+3, Tihomir Lichev написа:

Hello,
I recently discovered very interesting problem

The analyzer is whitespace:
{
"mappings": {
"test": {
"properties": {
"title": {
"analyzer": "whitespace",
"type": "string"
},
"description": {
"analyzer": "whitespace",
"type": "string"
}
}
}
}
}

I have 3 documents with the following content:
{
"id": 1,
"title" : "testing123",
"description" : "a description"
}
{
"id": 2,
"title" : "testing >end",
"description" : "another description"
}
{
"id": 3,
"title" : "testing <end",
"description" : "another description"
}

Then I did the following search queries:
{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": ">end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 3 hits (all docs) - wrong
EXPLAIN description: "ConstantScore(title:{end TO *]), product of:"

{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": "<end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 3 hits (all docs) - wrong
EXPLAIN description: "ConstantScore(title:[* TO end}), product of:"

{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": "end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 0 hits (correct!)

I tried to escape "<" and ">" in the query, but no success

Search for < and > in the elasticsearch and lucene docs - also no luck.
Obvoiusly they are recognized as range operators, but I want to use them
also as normal text symbols.
Is there any way to escape them properly ?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9fee6a55-b3ff-4192-9329-4e5f6fead45e%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9fee6a55-b3ff-4192-9329-4e5f6fead45e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d6bb86ed-111b-4909-a918-e8ae797579da%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d6bb86ed-111b-4909-a918-e8ae797579da%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEqhNz_J_avq-m3QXY4aDSwvQMOL%3DCDLAQKoJHsMWNoZg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Tihomir Lichev) #6

That makes much more sense :slight_smile:
Thanks, it works now!
And somewhere in the query_string docs should be mentioned that
simple_query_string should be preferred in where applicable :stuck_out_tongue:

12 август 2014, вторник, 12:07:58 UTC+3, Jörg Prante написа:

Once again, it is not related to "escaping".

Why don't you use simple query_string?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html

It wraps the crappy query_string into "match" query, correctly analyzed
and parsed to Elasticsearch settings, which query_string is not always able
to. And it provides boolean clauses.

Jörg

Examples:

POST /test/_search
{
"query" : {
"simple_query_string" : {
"title" : {
"query" : ">end"
}
}
}
}

POST /test/_search
{
"query" : {
"simple_query_string" : {
"title" : {
"query" : " testing and >end"
}
}
}
}

On Tue, Aug 12, 2014 at 11:01 AM, Tihomir Lichev <sho...@gmail.com
<javascript:>> wrote:

I think I don't agree ...
I'm using query_string because I want to give the users the ability to
use AND, OR +, - etc., out of the box.
I'm able to escape all other symbols except < and >, and I can use them
as part of the field content, also as part of the query like any regular
letter.
I dont understand why there is no way to escape those 2 symbols only. How
they are different compared to the others ???

Tihomir

12 август 2014, вторник, 11:57:21 UTC+3, Jörg Prante написа:

This is not a problem of escape.

Always use "match" query. Never use "query_string".

Jörg

On Tue, Aug 12, 2014 at 10:50 AM, Tihomir Lichev sho...@gmail.com
wrote:

Does anyone have an idea hot to escape < and > in a query ???

07 август 2014, четвъртък, 18:27:08 UTC+3, Tihomir Lichev написа:

Hello,
I recently discovered very interesting problem

The analyzer is whitespace:
{
"mappings": {
"test": {
"properties": {
"title": {
"analyzer": "whitespace",
"type": "string"
},
"description": {
"analyzer": "whitespace",
"type": "string"
}
}
}
}
}

I have 3 documents with the following content:
{
"id": 1,
"title" : "testing123",
"description" : "a description"
}
{
"id": 2,
"title" : "testing >end",
"description" : "another description"
}
{
"id": 3,
"title" : "testing <end",
"description" : "another description"
}

Then I did the following search queries:
{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": ">end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 3 hits (all docs) - wrong
EXPLAIN description: "ConstantScore(title:{end TO *]), product of:"

{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": "<end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 3 hits (all docs) - wrong
EXPLAIN description: "ConstantScore(title:[* TO end}), product of:"

{
"query": {

"bool": {
  "should": [
    {
      "query_string": {
        "query": "end",
        "fields": [
          "title"
        ]
      }
    }
  ]
}

}
}

RESULT: 0 hits (correct!)

I tried to escape "<" and ">" in the query, but no success

Search for < and > in the elasticsearch and lucene docs - also no
luck. Obvoiusly they are recognized as range operators, but I want to use
them also as normal text symbols.
Is there any way to escape them properly ?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9fee6a55-b3ff-4192-9329-4e5f6fead45e%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9fee6a55-b3ff-4192-9329-4e5f6fead45e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d6bb86ed-111b-4909-a918-e8ae797579da%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d6bb86ed-111b-4909-a918-e8ae797579da%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ad07f829-f604-4d17-9e17-e53f1f768541%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7