(Newbie) Differences between text and field/query_string, and matching words vs phrases


(Nick Dunn) #1

Hello all. I'm very new to ES and Lucene, but I have spent a lot of
time devouring the documentation, this group and Stack Overflow.
However I've yet to find a definitive answer to my, hopefully simple,
question.

I'm indexing several types (books, authors, publishers and a few
others) which in the first instance I would like searchable from a
single text input. I understand from the docs that the text query
(http://www.elasticsearch.org/guide/reference/query-dsl/text-
query.html) is designed for this out of the box "one size fits all"
requirement. However I would also like to support a basic search
syntax of:

  • have each keyword as optional (OR), which the default operator for a
    text query uses (tick!)
  • allow for a keyword to be prefixed with + or - to include/preclude
    it from the search (which is supported by the query_string query
    type... tick!)
  • allow for phrases by wrapping these in double quotes, so these
    become required phrases (e.g. "lorem ipsum" is essentially +"lorem
    ipsum")

In all instances I want to search "_all" for the time being.

After some playing with the various query types, it seems that the
text type offers good out of the box functionality, but the field/
query_string type offers the +/- syntax. But am I correct in saying
neither offer the phrase/double quotes syntax?

Having used elasticsearch-head to visually compile some queries, I am
wondering whether the above can only be implemented with some pre-
processing my end before sending the query to ES. I could use a
boolean query and then process in incoming query string to parse out
the +/- and phrases to build the must/should/must_not parts of the
boolean query myself.

I had hoped that I could simply throw this at ES verbatim, but will I
need to add my own layer of string parsing first?

Any direction greatly appreciated.


(Nick Dunn) #2

To elaborate a little more. Let's say my query input was:

"lord of the rings" tolkien -hobbit

I would hope it to be interpreted something like this... but without
me having to end the boolean query explicitly.

I hope I'm explaining myself properly.

On Feb 19, 5:28 pm, Nick Dunn n...@nick-dunn.co.uk wrote:

Hello all. I'm very new to ES and Lucene, but I have spent a lot of
time devouring the documentation, this group and Stack Overflow.
However I've yet to find a definitive answer to my, hopefully simple,
question.

I'm indexing several types (books, authors, publishers and a few
others) which in the first instance I would like searchable from a
single text input. I understand from the docs that the text query
(http://www.elasticsearch.org/guide/reference/query-dsl/text-
query.html) is designed for this out of the box "one size fits all"
requirement. However I would also like to support a basic search
syntax of:

  • have each keyword as optional (OR), which the default operator for a
    text query uses (tick!)
  • allow for a keyword to be prefixed with + or - to include/preclude
    it from the search (which is supported by the query_string query
    type... tick!)
  • allow for phrases by wrapping these in double quotes, so these
    become required phrases (e.g. "lorem ipsum" is essentially +"lorem
    ipsum")

In all instances I want to search "_all" for the time being.

After some playing with the various query types, it seems that the
text type offers good out of the box functionality, but the field/
query_string type offers the +/- syntax. But am I correct in saying
neither offer the phrase/double quotes syntax?

Having used elasticsearch-head to visually compile some queries, I am
wondering whether the above can only be implemented with some pre-
processing my end before sending the query to ES. I could use a
boolean query and then process in incoming query string to parse out
the +/- and phrases to build the must/should/must_not parts of the
boolean query myself.

I had hoped that I could simply throw this at ES verbatim, but will I
need to add my own layer of string parsing first?

Any direction greatly appreciated.


(Nick Dunn) #3

And please, if you don't think this level of complexity is necessary,
by all means just tell me to use a query_string search and nothing
more :wink:

On Feb 19, 5:36 pm, Nick Dunn n...@nick-dunn.co.uk wrote:

To elaborate a little more. Let's say my query input was:

"lord of the rings" tolkien -hobbit

I would hope it to be interpreted something like this... but without
me having to end the boolean query explicitly.

https://gist.github.com/1864764

I hope I'm explaining myself properly.

On Feb 19, 5:28 pm, Nick Dunn n...@nick-dunn.co.uk wrote:

Hello all. I'm very new to ES and Lucene, but I have spent a lot of
time devouring the documentation, this group and Stack Overflow.
However I've yet to find a definitive answer to my, hopefully simple,
question.

I'm indexing several types (books, authors, publishers and a few
others) which in the first instance I would like searchable from a
single text input. I understand from the docs that the text query
(http://www.elasticsearch.org/guide/reference/query-dsl/text-
query.html) is designed for this out of the box "one size fits all"
requirement. However I would also like to support a basic search
syntax of:

  • have each keyword as optional (OR), which the default operator for a
    text query uses (tick!)
  • allow for a keyword to be prefixed with + or - to include/preclude
    it from the search (which is supported by the query_string query
    type... tick!)
  • allow for phrases by wrapping these in double quotes, so these
    become required phrases (e.g. "lorem ipsum" is essentially +"lorem
    ipsum")

In all instances I want to search "_all" for the time being.

After some playing with the various query types, it seems that the
text type offers good out of the box functionality, but the field/
query_string type offers the +/- syntax. But am I correct in saying
neither offer the phrase/double quotes syntax?

Having used elasticsearch-head to visually compile some queries, I am
wondering whether the above can only be implemented with some pre-
processing my end before sending the query to ES. I could use a
boolean query and then process in incoming query string to parse out
the +/- and phrases to build the must/should/must_not parts of the
boolean query myself.

I had hoped that I could simply throw this at ES verbatim, but will I
need to add my own layer of string parsing first?

Any direction greatly appreciated.


(Lukáš Vlček) #4

Hi,

I would use query_string
http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html
it supports all the three requirements stated in your first mail.

Regards,
Lukas

On Sun, Feb 19, 2012 at 6:53 PM, Nick Dunn nick@nick-dunn.co.uk wrote:

And please, if you don't think this level of complexity is necessary,
by all means just tell me to use a query_string search and nothing
more :wink:

On Feb 19, 5:36 pm, Nick Dunn n...@nick-dunn.co.uk wrote:

To elaborate a little more. Let's say my query input was:

"lord of the rings" tolkien -hobbit

I would hope it to be interpreted something like this... but without
me having to end the boolean query explicitly.

https://gist.github.com/1864764

I hope I'm explaining myself properly.

On Feb 19, 5:28 pm, Nick Dunn n...@nick-dunn.co.uk wrote:

Hello all. I'm very new to ES and Lucene, but I have spent a lot of
time devouring the documentation, this group and Stack Overflow.
However I've yet to find a definitive answer to my, hopefully simple,
question.

I'm indexing several types (books, authors, publishers and a few
others) which in the first instance I would like searchable from a
single text input. I understand from the docs that the text query
(http://www.elasticsearch.org/guide/reference/query-dsl/text-
query.html) is designed for this out of the box "one size fits all"
requirement. However I would also like to support a basic search
syntax of:

  • have each keyword as optional (OR), which the default operator for a
    text query uses (tick!)
  • allow for a keyword to be prefixed with + or - to include/preclude
    it from the search (which is supported by the query_string query
    type... tick!)
  • allow for phrases by wrapping these in double quotes, so these
    become required phrases (e.g. "lorem ipsum" is essentially +"lorem
    ipsum")

In all instances I want to search "_all" for the time being.

After some playing with the various query types, it seems that the
text type offers good out of the box functionality, but the field/
query_string type offers the +/- syntax. But am I correct in saying
neither offer the phrase/double quotes syntax?

Having used elasticsearch-head to visually compile some queries, I am
wondering whether the above can only be implemented with some pre-
processing my end before sending the query to ES. I could use a
boolean query and then process in incoming query string to parse out
the +/- and phrases to build the must/should/must_not parts of the
boolean query myself.

I had hoped that I could simply throw this at ES verbatim, but will I
need to add my own layer of string parsing first?

Any direction greatly appreciated.


(Nick Dunn) #5

Thanks, Lukas. Digging around some more you are quite right, my
requirements are met. The ES docs aren't explicit about the phrase
(double quotes) matching, but this works nicely. Cheers!

On Feb 20, 12:17 am, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

I would use query_stringhttp://www.elasticsearch.org/guide/reference/query-dsl/query-string-q...
it supports all the three requirements stated in your first mail.

Regards,
Lukas

On Sun, Feb 19, 2012 at 6:53 PM, Nick Dunn n...@nick-dunn.co.uk wrote:

And please, if you don't think this level of complexity is necessary,
by all means just tell me to use a query_string search and nothing
more :wink:

On Feb 19, 5:36 pm, Nick Dunn n...@nick-dunn.co.uk wrote:

To elaborate a little more. Let's say my query input was:

"lord of the rings" tolkien -hobbit

I would hope it to be interpreted something like this... but without
me having to end the boolean query explicitly.

https://gist.github.com/1864764

I hope I'm explaining myself properly.

On Feb 19, 5:28 pm, Nick Dunn n...@nick-dunn.co.uk wrote:

Hello all. I'm very new to ES and Lucene, but I have spent a lot of
time devouring the documentation, this group and Stack Overflow.
However I've yet to find a definitive answer to my, hopefully simple,
question.

I'm indexing several types (books, authors, publishers and a few
others) which in the first instance I would like searchable from a
single text input. I understand from the docs that the text query
(http://www.elasticsearch.org/guide/reference/query-dsl/text-
query.html) is designed for this out of the box "one size fits all"
requirement. However I would also like to support a basic search
syntax of:

  • have each keyword as optional (OR), which the default operator for a
    text query uses (tick!)
  • allow for a keyword to be prefixed with + or - to include/preclude
    it from the search (which is supported by the query_string query
    type... tick!)
  • allow for phrases by wrapping these in double quotes, so these
    become required phrases (e.g. "lorem ipsum" is essentially +"lorem
    ipsum")

In all instances I want to search "_all" for the time being.

After some playing with the various query types, it seems that the
text type offers good out of the box functionality, but the field/
query_string type offers the +/- syntax. But am I correct in saying
neither offer the phrase/double quotes syntax?

Having used elasticsearch-head to visually compile some queries, I am
wondering whether the above can only be implemented with some pre-
processing my end before sending the query to ES. I could use a
boolean query and then process in incoming query string to parse out
the +/- and phrases to build the must/should/must_not parts of the
boolean query myself.

I had hoped that I could simply throw this at ES verbatim, but will I
need to add my own layer of string parsing first?

Any direction greatly appreciated.


(Shay Banon) #6

The field/query_string option provides the query syntax that you are after, text just analyzes the text you provide, without understanding specify syntax. field/query_string does support phrase searches by wrapping text in ".

On Monday, February 20, 2012 at 12:30 PM, Nick Dunn wrote:

Thanks, Lukas. Digging around some more you are quite right, my
requirements are met. The ES docs aren't explicit about the phrase
(double quotes) matching, but this works nicely. Cheers!

On Feb 20, 12:17 am, Lukáš Vlček <lukas.vl...@gmail.com (http://gmail.com)> wrote:

Hi,

I would use query_stringhttp://www.elasticsearch.org/guide/reference/query-dsl/query-string-q...
it supports all the three requirements stated in your first mail.

Regards,
Lukas

On Sun, Feb 19, 2012 at 6:53 PM, Nick Dunn <n...@nick-dunn.co.uk (http://nick-dunn.co.uk)> wrote:

And please, if you don't think this level of complexity is necessary,
by all means just tell me to use a query_string search and nothing
more :wink:

On Feb 19, 5:36 pm, Nick Dunn <n...@nick-dunn.co.uk (http://nick-dunn.co.uk)> wrote:

To elaborate a little more. Let's say my query input was:

"lord of the rings" tolkien -hobbit

I would hope it to be interpreted something like this... but without
me having to end the boolean query explicitly.

https://gist.github.com/1864764

I hope I'm explaining myself properly.

On Feb 19, 5:28 pm, Nick Dunn <n...@nick-dunn.co.uk (http://nick-dunn.co.uk)> wrote:

Hello all. I'm very new to ES and Lucene, but I have spent a lot of
time devouring the documentation, this group and Stack Overflow.
However I've yet to find a definitive answer to my, hopefully simple,
question.

I'm indexing several types (books, authors, publishers and a few
others) which in the first instance I would like searchable from a
single text input. I understand from the docs that the text query
(http://www.elasticsearch.org/guide/reference/query-dsl/text-
query.html) is designed for this out of the box "one size fits all"
requirement. However I would also like to support a basic search
syntax of:

  • have each keyword as optional (OR), which the default operator for a
    text query uses (tick!)
  • allow for a keyword to be prefixed with + or - to include/preclude
    it from the search (which is supported by the query_string query
    type... tick!)
  • allow for phrases by wrapping these in double quotes, so these
    become required phrases (e.g. "lorem ipsum" is essentially +"lorem
    ipsum")

In all instances I want to search "_all" for the time being.

After some playing with the various query types, it seems that the
text type offers good out of the box functionality, but the field/
query_string type offers the +/- syntax. But am I correct in saying
neither offer the phrase/double quotes syntax?

Having used elasticsearch-head to visually compile some queries, I am
wondering whether the above can only be implemented with some pre-
processing my end before sending the query to ES. I could use a
boolean query and then process in incoming query string to parse out
the +/- and phrases to build the must/should/must_not parts of the
boolean query myself.

I had hoped that I could simply throw this at ES verbatim, but will I
need to add my own layer of string parsing first?

Any direction greatly appreciated.


(system) #7