Top_hits and post_filter

Hi,

I'm using ES 1.4.2 to search across 4 slightly different types in a single
index. The following example is simplified version with only two types
"profile" and "publication". Profiles can have related publications,
publications can have related profiles.

I want to do a full text search (single search field) across all content
indexed in an index, preferring most related ones, of course, but
increasing score if profile has a lot of publications (so it is important
person, because is very active and should be preferred
in search results).
I want to display the search results grouped by _type (see by_type
aggregation) so I'm using terms aggregation and top_hits sub aggregation
with optional pagination (from, size). I want to show how many results is
in every department (people and publications always belongs to a single
department)

This works quite good. I can display aggregation by department for example:
Department 1 (100 results), Department 2 (50 results) and I can group
results by type: Profiles (60) Publications (90) and display for example 10
top hits for every type. That's perfect.

Now it comes to filtering. For example (simple case) I want to filter only
results from Department 1. I can do that quite easily using the filtered
query, but when I do that, The result is:

Department 1 (100 results), Department 2 (0 results)

Ok, that's wrong. Let's use post_filter... The problem is, post filter
written this way:
{
"size": 10,
"query": {
....
},
"post_filter": {
"and": [
{"match_all": {}}
]
},
"aggs": {
...
}
}

is not applied as it does not filter top_hits results but the "global"
results. post_filter is not supported by top_hits aggregation.

My use case is a bit more complicated. I want to filter by
publication_type, for example (in this case only Publications will be in
result set, of course), but I still want to display number of results in
unfiltered (but queried) set. Like: "If you tick this checkbox, you will
get another 50 results to the same query".

Hope my problem is clearly described.

Finally my question:
Is there any way how to solve this situation? How to apply a post_filter to
top_hits? If I remove top_hits aggregation and aggregate in client it might
easily come out there will be
1 profile displayed and 9 publications (if size would be set to 10). I want
to display 10 profiles and 10 publications with a pagination.
Maybe I'm doing it wrong and there is a better way how to achieve my
requirements.

Example index setup, example data and my query:

Thanks,
Radim

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4f5e09b3-9008-4727-af1d-6f05541cd943%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Can you use a filter agg? http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html

David

Le 26 janv. 2015 à 09:46, Radim Novotny novotny.radim@gmail.com a écrit :

Hi,

I'm using ES 1.4.2 to search across 4 slightly different types in a single index. The following example is simplified version with only two types "profile" and "publication". Profiles can have related publications, publications can have related profiles.

I want to do a full text search (single search field) across all content indexed in an index, preferring most related ones, of course, but increasing score if profile has a lot of publications (so it is important person, because is very active and should be preferred
in search results).
I want to display the search results grouped by _type (see by_type aggregation) so I'm using terms aggregation and top_hits sub aggregation with optional pagination (from, size). I want to show how many results is in every department (people and publications always belongs to a single department)

This works quite good. I can display aggregation by department for example: Department 1 (100 results), Department 2 (50 results) and I can group results by type: Profiles (60) Publications (90) and display for example 10 top hits for every type. That's perfect.

Now it comes to filtering. For example (simple case) I want to filter only results from Department 1. I can do that quite easily using the filtered query, but when I do that, The result is:

Department 1 (100 results), Department 2 (0 results)

Ok, that's wrong. Let's use post_filter... The problem is, post filter written this way:
{
"size": 10,
"query": {
....
},
"post_filter": {
"and": [
{"match_all": {}}
]
},
"aggs": {
...
}
}

is not applied as it does not filter top_hits results but the "global" results. post_filter is not supported by top_hits aggregation.

My use case is a bit more complicated. I want to filter by publication_type, for example (in this case only Publications will be in result set, of course), but I still want to display number of results in unfiltered (but queried) set. Like: "If you tick this checkbox, you will get another 50 results to the same query".

Hope my problem is clearly described.

Finally my question:
Is there any way how to solve this situation? How to apply a post_filter to top_hits? If I remove top_hits aggregation and aggregate in client it might easily come out there will be
1 profile displayed and 9 publications (if size would be set to 10). I want to display 10 profiles and 10 publications with a pagination.
Maybe I'm doing it wrong and there is a better way how to achieve my requirements.

Example index setup, example data and my query:
https://gist.github.com/naro/3ad9a1c85f03c631e02a

Thanks,
Radim

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4f5e09b3-9008-4727-af1d-6f05541cd943%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/AAD9A8DF-A48C-4664-98F3-127EFBDE971B%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Thanks, that did the trick :slight_smile:

Radim

Dne pondělí 26. ledna 2015 10:02:24 UTC+1 David Pilato napsal(a):

Can you use a filter agg?
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html

David

Le 26 janv. 2015 à 09:46, Radim Novotny <novotn...@gmail.com <javascript:>>
a écrit :

Hi,

I'm using ES 1.4.2 to search across 4 slightly different types in a single
index. The following example is simplified version with only two types
"profile" and "publication". Profiles can have related publications,
publications can have related profiles.

I want to do a full text search (single search field) across all content
indexed in an index, preferring most related ones, of course, but
increasing score if profile has a lot of publications (so it is
important person, because is very active and should be preferred
in search results).
I want to display the search results grouped by _type (see by_type
aggregation) so I'm using terms aggregation and top_hits sub aggregation
with optional pagination (from, size). I want to show how many results is
in every department (people and publications always belongs to a single
department)

This works quite good. I can display aggregation by department for
example: Department 1 (100 results), Department 2 (50 results) and I can
group results by type: Profiles (60) Publications (90) and display for
example 10 top hits for every type. That's perfect.

Now it comes to filtering. For example (simple case) I want to filter only
results from Department 1. I can do that quite easily using the filtered
query, but when I do that, The result is:

Department 1 (100 results), Department 2 (0 results)

Ok, that's wrong. Let's use post_filter... The problem is, post filter
written this way:
{
"size": 10,
"query": {
....
},
"post_filter": {
"and": [
{"match_all": {}}
]
},
"aggs": {
...
}
}

is not applied as it does not filter top_hits results but the "global"
results. post_filter is not supported by top_hits aggregation.

My use case is a bit more complicated. I want to filter by
publication_type, for example (in this case only Publications will be in
result set, of course), but I still want to display number of results in
unfiltered (but queried) set. Like: "If you tick this checkbox, you will
get another 50 results to the same query".

Hope my problem is clearly described.

Finally my question:
Is there any way how to solve this situation? How to apply a post_filter
to top_hits? If I remove top_hits aggregation and aggregate in client it
might easily come out there will be
1 profile displayed and 9 publications (if size would be set to 10). I
want to display 10 profiles and 10 publications with a pagination.
Maybe I'm doing it wrong and there is a better way how to achieve my
requirements.

Example index setup, example data and my query:
https://gist.github.com/naro/3ad9a1c85f03c631e02a

Thanks,
Radim

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4f5e09b3-9008-4727-af1d-6f05541cd943%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4f5e09b3-9008-4727-af1d-6f05541cd943%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e65d73de-c7a5-40e5-8355-aef108b87b32%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.