How can I facet based on fields with hits?


(Ellery Crane-2) #1

I'm brand new to ElasticSearch, and exploring the wonderful faceting
features. I have a business use case I'm trying to construct a query
for, and I was hoping someone could point me toward a more elegant
(and performant) solution.

I want to perform a query against some set of fields (usually all of
them) in my documents, and then set up facets defining results with
hits in specific fields.

For example, take the following data:

curl -XPUT 'http://localhost:9200/test/contact/1' -d '
{
"name": {
"first": "Brad",
"last": "Smith"
},
"address":{
"street": "123 Smithsonian Avenue",
"city": "Albany",
"state": "NY"
}
}'

curl -XPUT 'http://localhost:9200/test/contact/2' -d '
{
"name": {
"first": "Jane",
"last": "Doe"
},
"address":{
"street": "1 Green Valley Blvd.",
"city": "Smithville",
"state": "NY"
}
}'

curl -XPUT 'http://localhost:9200/test/contact/3' -d '
{
"name": {
"first": "Janet",
"last": "Green"
},
"address":{
"street": "456 Goldsmith Place",
"city": "Bradley",
"state": "MN"
}
}'

I want to query all fields in all documents, and then facet based on
the field it found the term in. One way I've found to do this is as
follows:

curl -XPOST http://localhost:9200/test/contact/_search?pretty=true -d
'{
"query": {
"query_string": {
"query": "smith"
}
},
facets: {
names_facet: {
query: {
query_string: {
query: "smith",
fields: [ "name.first", "name.last"],
use_dis_max: true
}
}
},
address_facet: {
query: {
query_string: {
query: "smith",
fields: [ "address.street", "address.city",
"address.state"],
use_dis_max: true
}
}
}
}
}'

This gives me the results I'm after; 3 hits total, with the following
facet counts:

"facets" : {
"names_facet" : {
"_type" : "query",
"count" : 1
},
"address_facet" : {
"_type" : "query",
"count" : 3
}
}

The problem with this method is that duplicating the query across each
facet seems both ugly and expensive. Is there some way I can facet
based solely on fields that have hits? I looked at using the terms
filter, but I'm not after a list of the common terms in each field. I
just want to know how many hits in the original query happened in each
set of fields. Is there something in the Query DSL I can use towards
this end?

Thanks!


(Shay Banon) #2

There isn't an option to facet just on the fields that have hits, since that information is not available during search unless you execute specific query facet on each one, as you do, which is, as you said, not per wise.
On Tuesday, May 3, 2011 at 11:00 PM, Ellery Crane wrote:

I'm brand new to ElasticSearch, and exploring the wonderful faceting
features. I have a business use case I'm trying to construct a query
for, and I was hoping someone could point me toward a more elegant
(and performant) solution.

I want to perform a query against some set of fields (usually all of
them) in my documents, and then set up facets defining results with
hits in specific fields.

For example, take the following data:

curl -XPUT 'http://localhost:9200/test/contact/1' -d '
{
"name": {
"first": "Brad",
"last": "Smith"
},
"address":{
"street": "123 Smithsonian Avenue",
"city": "Albany",
"state": "NY"
}
}'

curl -XPUT 'http://localhost:9200/test/contact/2' -d '
{
"name": {
"first": "Jane",
"last": "Doe"
},
"address":{
"street": "1 Green Valley Blvd.",
"city": "Smithville",
"state": "NY"
}
}'

curl -XPUT 'http://localhost:9200/test/contact/3' -d '
{
"name": {
"first": "Janet",
"last": "Green"
},
"address":{
"street": "456 Goldsmith Place",
"city": "Bradley",
"state": "MN"
}
}'

I want to query all fields in all documents, and then facet based on
the field it found the term in. One way I've found to do this is as
follows:

curl -XPOST http://localhost:9200/test/contact/_search?pretty=true -d
'{
"query": {
"query_string": {
"query": "smith"
}
},
facets: {
names_facet: {
query: {
query_string: {
query: "smith",
fields: [ "name.first", "name.last"],
use_dis_max: true
}
}
},
address_facet: {
query: {
query_string: {
query: "smith",
fields: [ "address.street", "address.city",
"address.state"],
use_dis_max: true
}
}
}
}
}'

This gives me the results I'm after; 3 hits total, with the following
facet counts:

"facets" : {
"names_facet" : {
"_type" : "query",
"count" : 1
},
"address_facet" : {
"_type" : "query",
"count" : 3
}
}

The problem with this method is that duplicating the query across each
facet seems both ugly and expensive. Is there some way I can facet
based solely on fields that have hits? I looked at using the terms
filter, but I'm not after a list of the common terms in each field. I
just want to know how many hits in the original query happened in each
set of fields. Is there something in the Query DSL I can use towards
this end?

Thanks!


(Ellery Crane-2) #3

Would there be some way to tap into the logic which is aggregating the
field highlighting in order to get the kind of facet counting I'm
after? In other words, facet on whether or not there is a highlight
found for a given field?

On May 4, 4:10 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There isn't an option to facet just on the fields that have hits, since that information is not available during search unless you execute specific query facet on each one, as you do, which is, as you said, not per wise.


(Shay Banon) #4

No, and honestly, not sure how to implement something like that in a performant manner. Requires some thinking.
On Wednesday, May 4, 2011 at 5:57 PM, Ellery Crane wrote:

Would there be some way to tap into the logic which is aggregating the
field highlighting in order to get the kind of facet counting I'm
after? In other words, facet on whether or not there is a highlight
found for a given field?

On May 4, 4:10 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There isn't an option to facet just on the fields that have hits, since that information is not available during search unless you execute specific query facet on each one, as you do, which is, as you said, not per wise.


(Ellery Crane-2) #5

On May 4, 12:41 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

No, and honestly, not sure how to implement something like that in a performant manner. Requires some thinking.

Thank you for the prompt reply- it is appreciated.

I suppose I can deal with the performance hit to get the behavior I'm
after- just being able to facet in the fashion I used in my initial
post is extraordinary! Still, being able achieve this functionality in
a more elegant and performant manner would be amazing. If a way of
doing so ever occurs to you, be assured that at least one of your
users would be most happy for the feature :slight_smile:


(system) #6