Which query is the best for standard searching?

Hello!

First of all I would suggest creating a separate thread for your
question Shaun. It's easier to find answers for others, when new
thread is created for each question.

So, let's stick to the original question - an example query. I won't
give you a general example, but lets say that your user entered the
'daily notice' text into the search box and you want to get results
with phrase queries and stuff like that. We would like at least 50% of
the terms entered into the search box to be found. Also you would like
'name' field to be more valuable than 'description' field (this are
the only fields you are searching). In addition to that, you want the
query to return only results from a 'category' named 'books' (so you
have category field in your index). The example query may look like
this:

{
"query": {
"query_string" : {
"query": "daily notice",
"fields" : ["description", "name^100"],
"minimum_should_match": "50%",
"use_dis_max": true,
"tie_breaker": 0.9,
"auto_generate_phrase_queries": true
},
"filter" : {
"and": {
"filters": [ {
"term" : { "category" : "books"}
} ]
}
}
}
}

Of course, as you may already noticed, the same thing can be done in
multiple ways in ElasticSearch. The above is just an example you can
start with and further expand and change to match your needs.

Hope that helps, at least a bit.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

Hi Marcin

You're not alone; i've found the query DSL quite difficult to understand.

This is what i have for my search so far. It's basically a wildcard text search which is then filtered to restrict access based on some
of the document properties.

Perhaps this is even more basic than you're looking for. I'm only just starting out with ES and have never used Lucene before.
I imagine what i have done would be considered absolutely terrible by those "in the know".

I'd be very grateful if someone who has a good understanding of the ES query DSL could take a look at that gist and provide some feedback about the way i've done it and possibly show me how to improve it and/or some alternative implementations.

A tutorial about the DSL structure would be good (for totally new users) but i have not seen anything very helpful to me yet.
It's just been lots of head-scratching and trial-&-error so far. :slight_smile:

cheers

  • shaun

--
shaun etherton
Sent with Sparrow

On Thursday, 5 July 2012 at 9:52 PM, Marcin Oleszkiewicz wrote:
at the moment I know (thanks to Rafał) how should I build such query

  1. BOOLEAN query with OR that will find all docs with at least one term
  2. PHRASE_QUERY to boost documents with words close to each other (using sloop)
  3. BOOSTING QUERY to boost that have all terms in the field
  4. use some language plugin

how to combine it to a query, it's kinda hard fo a newbie

Rafa could you show how I should build the query you have decsribed to me
before?

Hi, You're right.

Sorry, I wasn't trying to hijack the thread, just trying to help the OP with an example that's working for my simple use-case. I'll create a new thread and ask for some tips there. :slight_smile:

Cheers --
Shaun

On Friday, 6 July 2012 at 6:55, Rafał Kuć wrote:

Hello!

First of all I would suggest creating a separate thread for your
question Shaun. It's easier to find answers for others, when new
thread is created for each question.

So, let's stick to the original question - an example query. I won't
give you a general example, but lets say that your user entered the
'daily notice' text into the search box and you want to get results
with phrase queries and stuff like that. We would like at least 50% of
the terms entered into the search box to be found. Also you would like
'name' field to be more valuable than 'description' field (this are
the only fields you are searching). In addition to that, you want the
query to return only results from a 'category' named 'books' (so you
have category field in your index). The example query may look like
this:

{
"query": {
"query_string" : {
"query": "daily notice",
"fields" : ["description", "name^100"],
"minimum_should_match": "50%",
"use_dis_max": true,
"tie_breaker": 0.9,
"auto_generate_phrase_queries": true
},
"filter" : {
"and": {
"filters": [ {
"term" : { "category" : "books"}
} ]
}
}
}
}

Of course, as you may already noticed, the same thing can be done in
multiple ways in Elasticsearch. The above is just an example you can
start with and further expand and change to match your needs.

Hope that helps, at least a bit.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi Marcin

You're not alone; i've found the query DSL quite difficult to understand.

This is what i have for my search so far. It's basically a wildcard text search which is then filtered to restrict access based on some
of the document properties.

Perhaps this is even more basic than you're looking for. I'm only just starting out with ES and have never used Lucene before.
I imagine what i have done would be considered absolutely terrible by those "in the know".

https://gist.github.com/3053914

I'd be very grateful if someone who has a good understanding of the ES query DSL could take a look at that gist and provide some feedback about the way i've done it and possibly show me how to improve it and/or some alternative implementations.

A tutorial about the DSL structure would be good (for totally new users) but i have not seen anything very helpful to me yet.
It's just been lots of head-scratching and trial-&-error so far. :slight_smile:

cheers

  • shaun

--
shaun etherton
Sent with Sparrow

On Thursday, 5 July 2012 at 9:52 PM, Marcin Oleszkiewicz wrote:
at the moment I know (thanks to Rafał) how should I build such query

  1. BOOLEAN query with OR that will find all docs with at least one term
  2. PHRASE_QUERY to boost documents with words close to each other (using sloop)
  3. BOOSTING QUERY to boost that have all terms in the field
  4. use some language plugin

how to combine it to a query, it's kinda hard fo a newbie

For me, this example & description of the requirement is very very helpful - thank you!

On Friday, 6 July 2012 at 6:55, Rafał Kuć wrote:

Hello!

First of all I would suggest creating a separate thread for your
question Shaun. It's easier to find answers for others, when new
thread is created for each question.

So, let's stick to the original question - an example query. I won't
give you a general example, but lets say that your user entered the
'daily notice' text into the search box and you want to get results
with phrase queries and stuff like that. We would like at least 50% of
the terms entered into the search box to be found. Also you would like
'name' field to be more valuable than 'description' field (this are
the only fields you are searching). In addition to that, you want the
query to return only results from a 'category' named 'books' (so you
have category field in your index). The example query may look like
this:

{
"query": {
"query_string" : {
"query": "daily notice",
"fields" : ["description", "name^100"],
"minimum_should_match": "50%",
"use_dis_max": true,
"tie_breaker": 0.9,
"auto_generate_phrase_queries": true
},
"filter" : {
"and": {
"filters": [ {
"term" : { "category" : "books"}
} ]
}
}
}
}

Of course, as you may already noticed, the same thing can be done in
multiple ways in Elasticsearch. The above is just an example you can
start with and further expand and change to match your needs.

Hope that helps, at least a bit.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi Marcin

You're not alone; i've found the query DSL quite difficult to understand.

This is what i have for my search so far. It's basically a wildcard text search which is then filtered to restrict access based on some
of the document properties.

Perhaps this is even more basic than you're looking for. I'm only just starting out with ES and have never used Lucene before.
I imagine what i have done would be considered absolutely terrible by those "in the know".

https://gist.github.com/3053914

I'd be very grateful if someone who has a good understanding of the ES query DSL could take a look at that gist and provide some feedback about the way i've done it and possibly show me how to improve it and/or some alternative implementations.

A tutorial about the DSL structure would be good (for totally new users) but i have not seen anything very helpful to me yet.
It's just been lots of head-scratching and trial-&-error so far. :slight_smile:

cheers

  • shaun

--
shaun etherton
Sent with Sparrow

On Thursday, 5 July 2012 at 9:52 PM, Marcin Oleszkiewicz wrote:
at the moment I know (thanks to Rafał) how should I build such query

  1. BOOLEAN query with OR that will find all docs with at least one term
  2. PHRASE_QUERY to boost documents with words close to each other (using sloop)
  3. BOOSTING QUERY to boost that have all terms in the field
  4. use some language plugin

how to combine it to a query, it's kinda hard fo a newbie

Hello,

I hope it will help some of you.
I'm quite new to ES and also find it difficult.

You'll find my Java implementation + mapping + unit test of some complex
query.

This query permits to get highlights. The unit test works, but i'm not even
sure it's the good way to do, or if it scales well.

Oups i forgot the link!!!

On Friday, July 6, 2012 3:35:21 PM UTC+2, Sébastien Lorber wrote:

Hello,

I hope it will help some of you.
I'm quite new to ES and also find it difficult.

You'll find my Java implementation + mapping + unit test of some complex
query.

This query permits to get highlights. The unit test works, but i'm not
even sure it's the good way to do, or if it scales well.