Migrating lucene drill sideways query to elasticsearch

Hi,

I am trying to migrate a project from Lucene to elasticsearch, and for the
most part it is a pleasure :slight_smile:
However, I cannot wrap my head around how to recreate the drill sideways
queries we currently use in Lucene.

The scenario is a basic search page with a free text search and a bunch of
drill down/sideways facets. In Lucene, the hits that we get for each facet,
is a correct representation of how many results we would get if that facet
is used as a limit, but I am unable to do this in elasticsearch...

As an example (full gist available here:
https://gist.github.com/bogundersen/e9bac02779e1c4a089dc)

I have three items:
Item 1:
language : en_GB,
year: 2013,
author: [ John, Paul ]
Item 2:
language : en_GB,
year: 2012,
author: [ John, George ]
Item 3:
language : da_DK,
year: 2012,
author: [ Ringo ]

Now lets imagine that the user limits to year 2012. If I just include the
facet in the query ("Search 2" in the gist), I would get the following
facets:
year
2012 : 2
author
George 1
John 1
Ringo 1
language
da_DK 1
en_GB 1
The author and language facets show the correct numbers, but the year facet
only shows year 2012 thereby not allowing the user to select another year
without deselecting 2012.

A way around this is to use post filters ("Search 3" in the gist), using
those I get the following facet results:
year
2012 : 2
2013 : 1
author
John 2
George 1
Paul 1
Ringo 1
language
en_GB 2
da_DK 1
Here the user is still presented with other years, but the numbers for
author and language are not correct (e.g. selecting "John" will only give 1
result, and not two)

The only way I can think of to make this work, is to do separate queries
for each facet, but that seems counter intuitive and not very performance
friendly. Any ideas on how to do this in elasticsearch?

--
Bo Madsen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2e97801f-a091-4f1d-8e31-1ffb777f287c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Have you considered to use filters / filter buckets like described in the
guide?

Jörg

On Fri, Jan 16, 2015 at 4:15 PM, Bo Finnerup Madsen bo.gundersen@gmail.com
wrote:

Hi,

I am trying to migrate a project from Lucene to elasticsearch, and for the
most part it is a pleasure :slight_smile:
However, I cannot wrap my head around how to recreate the drill sideways
queries we currently use in Lucene.

The scenario is a basic search page with a free text search and a bunch of
drill down/sideways facets. In Lucene, the hits that we get for each facet,
is a correct representation of how many results we would get if that facet
is used as a limit, but I am unable to do this in elasticsearch...

As an example (full gist available here:
gist:e9bac02779e1c4a089dc · GitHub)

I have three items:
Item 1:
language : en_GB,
year: 2013,
author: [ John, Paul ]
Item 2:
language : en_GB,
year: 2012,
author: [ John, George ]
Item 3:
language : da_DK,
year: 2012,
author: [ Ringo ]

Now lets imagine that the user limits to year 2012. If I just include the
facet in the query ("Search 2" in the gist), I would get the following
facets:
year
2012 : 2
author
George 1
John 1
Ringo 1
language
da_DK 1
en_GB 1
The author and language facets show the correct numbers, but the year
facet only shows year 2012 thereby not allowing the user to select another
year without deselecting 2012.

A way around this is to use post filters ("Search 3" in the gist), using
those I get the following facet results:
year
2012 : 2
2013 : 1
author
John 2
George 1
Paul 1
Ringo 1
language
en_GB 2
da_DK 1
Here the user is still presented with other years, but the numbers for
author and language are not correct (e.g. selecting "John" will only give 1
result, and not two)

The only way I can think of to make this work, is to do separate queries
for each facet, but that seems counter intuitive and not very performance
friendly. Any ideas on how to do this in elasticsearch?

--
Bo Madsen

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e97801f-a091-4f1d-8e31-1ffb777f287c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e97801f-a091-4f1d-8e31-1ffb777f287c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHMj%2BW3McNw_qh_iN9dQGyqAMTvVimre-g_Q%3D-xzS%2BNmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I think you must do separate filters to compute the sideways facet counts.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jan 16, 2015 at 10:15 AM, Bo Finnerup Madsen <bo.gundersen@gmail.com

wrote:

Hi,

I am trying to migrate a project from Lucene to elasticsearch, and for the
most part it is a pleasure :slight_smile:
However, I cannot wrap my head around how to recreate the drill sideways
queries we currently use in Lucene.

The scenario is a basic search page with a free text search and a bunch of
drill down/sideways facets. In Lucene, the hits that we get for each facet,
is a correct representation of how many results we would get if that facet
is used as a limit, but I am unable to do this in elasticsearch...

As an example (full gist available here:
gist:e9bac02779e1c4a089dc · GitHub)

I have three items:
Item 1:
language : en_GB,
year: 2013,
author: [ John, Paul ]
Item 2:
language : en_GB,
year: 2012,
author: [ John, George ]
Item 3:
language : da_DK,
year: 2012,
author: [ Ringo ]

Now lets imagine that the user limits to year 2012. If I just include the
facet in the query ("Search 2" in the gist), I would get the following
facets:
year
2012 : 2
author
George 1
John 1
Ringo 1
language
da_DK 1
en_GB 1
The author and language facets show the correct numbers, but the year
facet only shows year 2012 thereby not allowing the user to select another
year without deselecting 2012.

A way around this is to use post filters ("Search 3" in the gist), using
those I get the following facet results:
year
2012 : 2
2013 : 1
author
John 2
George 1
Paul 1
Ringo 1
language
en_GB 2
da_DK 1
Here the user is still presented with other years, but the numbers for
author and language are not correct (e.g. selecting "John" will only give 1
result, and not two)

The only way I can think of to make this work, is to do separate queries
for each facet, but that seems counter intuitive and not very performance
friendly. Any ideas on how to do this in elasticsearch?

--
Bo Madsen

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e97801f-a091-4f1d-8e31-1ffb777f287c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e97801f-a091-4f1d-8e31-1ffb777f287c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRczzTifRND7XokyddfNH%2B050jBUnn%2ByhCLxe-jtYKpYeQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Jörg,

That might actually do the trick. I have updated the gist (
gist:e9bac02779e1c4a089dc · GitHub) with a "Search 4"
which uses this method. It gives the expected results, so that is good :slight_smile:
How about the cost of this? We will be doing this for 4-5 facets, and using
this method they will all be computed using their own set of filters...

Den fredag den 16. januar 2015 kl. 16.46.46 UTC+1 skrev Jörg Prante:

Have you considered to use filters / filter buckets like described in the
guide?

Elasticsearch Platform — Find real-time answers at scale | Elastic

Jörg

On Fri, Jan 16, 2015 at 4:15 PM, Bo Finnerup Madsen <bo.gun...@gmail.com
<javascript:>> wrote:

Hi,

I am trying to migrate a project from Lucene to elasticsearch, and for
the most part it is a pleasure :slight_smile:
However, I cannot wrap my head around how to recreate the drill sideways
queries we currently use in Lucene.

The scenario is a basic search page with a free text search and a bunch
of drill down/sideways facets. In Lucene, the hits that we get for each
facet, is a correct representation of how many results we would get if that
facet is used as a limit, but I am unable to do this in elasticsearch...

As an example (full gist available here:
gist:e9bac02779e1c4a089dc · GitHub)

I have three items:
Item 1:
language : en_GB,
year: 2013,
author: [ John, Paul ]
Item 2:
language : en_GB,
year: 2012,
author: [ John, George ]
Item 3:
language : da_DK,
year: 2012,
author: [ Ringo ]

Now lets imagine that the user limits to year 2012. If I just include the
facet in the query ("Search 2" in the gist), I would get the following
facets:
year
2012 : 2
author
George 1
John 1
Ringo 1
language
da_DK 1
en_GB 1
The author and language facets show the correct numbers, but the year
facet only shows year 2012 thereby not allowing the user to select another
year without deselecting 2012.

A way around this is to use post filters ("Search 3" in the gist), using
those I get the following facet results:
year
2012 : 2
2013 : 1
author
John 2
George 1
Paul 1
Ringo 1
language
en_GB 2
da_DK 1
Here the user is still presented with other years, but the numbers for
author and language are not correct (e.g. selecting "John" will only give 1
result, and not two)

The only way I can think of to make this work, is to do separate queries
for each facet, but that seems counter intuitive and not very performance
friendly. Any ideas on how to do this in elasticsearch?

--
Bo Madsen

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e97801f-a091-4f1d-8e31-1ffb777f287c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e97801f-a091-4f1d-8e31-1ffb777f287c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/bceb2634-df14-4e3a-bd64-a2e3158a4592%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mike,

Thanks, that is in line with what Jörg suggested. I have updated the gist
with a search using this approach and it gives the correct result. However
I am a bit concerned about the cost of this, as we will be running 4-5
facets each of which will require their own set of filters.
But if it is the recommended way, I will try to implemented it and run a
performance test :slight_smile:

Den fredag den 16. januar 2015 kl. 18.13.34 UTC+1 skrev Michael McCandless:

I think you must do separate filters to compute the sideways facet counts.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jan 16, 2015 at 10:15 AM, Bo Finnerup Madsen <bo.gun...@gmail.com
<javascript:>> wrote:

Hi,

I am trying to migrate a project from Lucene to elasticsearch, and for
the most part it is a pleasure :slight_smile:
However, I cannot wrap my head around how to recreate the drill sideways
queries we currently use in Lucene.

The scenario is a basic search page with a free text search and a bunch
of drill down/sideways facets. In Lucene, the hits that we get for each
facet, is a correct representation of how many results we would get if that
facet is used as a limit, but I am unable to do this in elasticsearch...

As an example (full gist available here:
gist:e9bac02779e1c4a089dc · GitHub)

I have three items:
Item 1:
language : en_GB,
year: 2013,
author: [ John, Paul ]
Item 2:
language : en_GB,
year: 2012,
author: [ John, George ]
Item 3:
language : da_DK,
year: 2012,
author: [ Ringo ]

Now lets imagine that the user limits to year 2012. If I just include the
facet in the query ("Search 2" in the gist), I would get the following
facets:
year
2012 : 2
author
George 1
John 1
Ringo 1
language
da_DK 1
en_GB 1
The author and language facets show the correct numbers, but the year
facet only shows year 2012 thereby not allowing the user to select another
year without deselecting 2012.

A way around this is to use post filters ("Search 3" in the gist), using
those I get the following facet results:
year
2012 : 2
2013 : 1
author
John 2
George 1
Paul 1
Ringo 1
language
en_GB 2
da_DK 1
Here the user is still presented with other years, but the numbers for
author and language are not correct (e.g. selecting "John" will only give 1
result, and not two)

The only way I can think of to make this work, is to do separate queries
for each facet, but that seems counter intuitive and not very performance
friendly. Any ideas on how to do this in elasticsearch?

--
Bo Madsen

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2e97801f-a091-4f1d-8e31-1ffb777f287c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e97801f-a091-4f1d-8e31-1ffb777f287c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce635bd0-7295-4369-96d9-9d60d7578e8a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I do not think you have to worry, I use a dozen of aggregations with
filters with success on 50m docs with 8G RAM and 3 nodes. But if your tests
show a massive slowdown, you should come back with your findings including
performance numbers, so the ES core team can have a look at it.

Jörg

On Fri, Jan 16, 2015 at 7:35 PM, Bo Finnerup Madsen bo.gundersen@gmail.com
wrote:

Hi Mike,

Thanks, that is in line with what Jörg suggested. I have updated the gist
with a search using this approach and it gives the correct result. However
I am a bit concerned about the cost of this, as we will be running 4-5
facets each of which will require their own set of filters.
But if it is the recommended way, I will try to implemented it and run a
performance test :slight_smile:

Den fredag den 16. januar 2015 kl. 18.13.34 UTC+1 skrev Michael McCandless:

I think you must do separate filters to compute the sideways facet counts.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jan 16, 2015 at 10:15 AM, Bo Finnerup Madsen <bo.gun...@gmail.com

wrote:

Hi,

I am trying to migrate a project from Lucene to elasticsearch, and for
the most part it is a pleasure :slight_smile:
However, I cannot wrap my head around how to recreate the drill sideways
queries we currently use in Lucene.

The scenario is a basic search page with a free text search and a bunch
of drill down/sideways facets. In Lucene, the hits that we get for each
facet, is a correct representation of how many results we would get if that
facet is used as a limit, but I am unable to do this in elasticsearch...

As an example (full gist available here: https://gist.github.com/
bogundersen/e9bac02779e1c4a089dc)

I have three items:
Item 1:
language : en_GB,
year: 2013,
author: [ John, Paul ]
Item 2:
language : en_GB,
year: 2012,
author: [ John, George ]
Item 3:
language : da_DK,
year: 2012,
author: [ Ringo ]

Now lets imagine that the user limits to year 2012. If I just include
the facet in the query ("Search 2" in the gist), I would get the following
facets:
year
2012 : 2
author
George 1
John 1
Ringo 1
language
da_DK 1
en_GB 1
The author and language facets show the correct numbers, but the year
facet only shows year 2012 thereby not allowing the user to select another
year without deselecting 2012.

A way around this is to use post filters ("Search 3" in the gist), using
those I get the following facet results:
year
2012 : 2
2013 : 1
author
John 2
George 1
Paul 1
Ringo 1
language
en_GB 2
da_DK 1
Here the user is still presented with other years, but the numbers for
author and language are not correct (e.g. selecting "John" will only give 1
result, and not two)

The only way I can think of to make this work, is to do separate queries
for each facet, but that seems counter intuitive and not very performance
friendly. Any ideas on how to do this in elasticsearch?

--
Bo Madsen

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2e97801f-a091-4f1d-8e31-1ffb777f287c%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e97801f-a091-4f1d-8e31-1ffb777f287c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ce635bd0-7295-4369-96d9-9d60d7578e8a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ce635bd0-7295-4369-96d9-9d60d7578e8a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoETUF%2BbLo9z4WtFGTSCpJL9A0c0Lcys5gtrxyQGn_VEpA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I think you are right :slight_smile: I tried to implement it, and performance seems to
be good.
Thank you (and Michael) very much for the help.

Den fredag den 16. januar 2015 kl. 22.18.17 UTC+1 skrev Jörg Prante:

I do not think you have to worry, I use a dozen of aggregations with
filters with success on 50m docs with 8G RAM and 3 nodes. But if your tests
show a massive slowdown, you should come back with your findings including
performance numbers, so the ES core team can have a look at it.

Jörg

On Fri, Jan 16, 2015 at 7:35 PM, Bo Finnerup Madsen <bo.gun...@gmail.com
<javascript:>> wrote:

Hi Mike,

Thanks, that is in line with what Jörg suggested. I have updated the gist
with a search using this approach and it gives the correct result. However
I am a bit concerned about the cost of this, as we will be running 4-5
facets each of which will require their own set of filters.
But if it is the recommended way, I will try to implemented it and run a
performance test :slight_smile:

Den fredag den 16. januar 2015 kl. 18.13.34 UTC+1 skrev Michael
McCandless:

I think you must do separate filters to compute the sideways facet
counts.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jan 16, 2015 at 10:15 AM, Bo Finnerup Madsen <
bo.gun...@gmail.com> wrote:

Hi,

I am trying to migrate a project from Lucene to elasticsearch, and for
the most part it is a pleasure :slight_smile:
However, I cannot wrap my head around how to recreate the drill
sideways queries we currently use in Lucene.

The scenario is a basic search page with a free text search and a bunch
of drill down/sideways facets. In Lucene, the hits that we get for each
facet, is a correct representation of how many results we would get if that
facet is used as a limit, but I am unable to do this in elasticsearch...

As an example (full gist available here: https://gist.github.com/
bogundersen/e9bac02779e1c4a089dc)

I have three items:
Item 1:
language : en_GB,
year: 2013,
author: [ John, Paul ]
Item 2:
language : en_GB,
year: 2012,
author: [ John, George ]
Item 3:
language : da_DK,
year: 2012,
author: [ Ringo ]

Now lets imagine that the user limits to year 2012. If I just include
the facet in the query ("Search 2" in the gist), I would get the following
facets:
year
2012 : 2
author
George 1
John 1
Ringo 1
language
da_DK 1
en_GB 1
The author and language facets show the correct numbers, but the year
facet only shows year 2012 thereby not allowing the user to select another
year without deselecting 2012.

A way around this is to use post filters ("Search 3" in the gist),
using those I get the following facet results:
year
2012 : 2
2013 : 1
author
John 2
George 1
Paul 1
Ringo 1
language
en_GB 2
da_DK 1
Here the user is still presented with other years, but the numbers for
author and language are not correct (e.g. selecting "John" will only give 1
result, and not two)

The only way I can think of to make this work, is to do separate
queries for each facet, but that seems counter intuitive and not very
performance friendly. Any ideas on how to do this in elasticsearch?

--
Bo Madsen

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2e97801f-a091-4f1d-8e31-1ffb777f287c%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2e97801f-a091-4f1d-8e31-1ffb777f287c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ce635bd0-7295-4369-96d9-9d60d7578e8a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ce635bd0-7295-4369-96d9-9d60d7578e8a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/020d2bfc-0cf5-4439-992f-e507ab6b5582%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.