I'm brand new to ElasticSearch, and exploring the wonderful faceting
features. I have a business use case I'm trying to construct a query
for, and I was hoping someone could point me toward a more elegant
(and performant) solution.
I want to perform a query against some set of fields (usually all of
them) in my documents, and then set up facets defining results with
hits in specific fields.
For example, take the following data:
curl -XPUT 'http://localhost:9200/test/contact/1' -d '
{
"name": {
"first": "Brad",
"last": "Smith"
},
"address":{
"street": "123 Smithsonian Avenue",
"city": "Albany",
"state": "NY"
}
}'
curl -XPUT 'http://localhost:9200/test/contact/2' -d '
{
"name": {
"first": "Jane",
"last": "Doe"
},
"address":{
"street": "1 Green Valley Blvd.",
"city": "Smithville",
"state": "NY"
}
}'
curl -XPUT 'http://localhost:9200/test/contact/3' -d '
{
"name": {
"first": "Janet",
"last": "Green"
},
"address":{
"street": "456 Goldsmith Place",
"city": "Bradley",
"state": "MN"
}
}'
I want to query all fields in all documents, and then facet based on
the field it found the term in. One way I've found to do this is as
follows:
curl -XPOST http://localhost:9200/test/contact/_search?pretty=true -d
'{
"query": {
"query_string": {
"query": "smith"
}
},
facets: {
names_facet: {
query: {
query_string: {
query: "smith",
fields: [ "name.first", "name.last"],
use_dis_max: true
}
}
},
address_facet: {
query: {
query_string: {
query: "smith",
fields: [ "address.street", "address.city",
"address.state"],
use_dis_max: true
}
}
}
}
}'
This gives me the results I'm after; 3 hits total, with the following
facet counts:
"facets" : {
"names_facet" : {
"_type" : "query",
"count" : 1
},
"address_facet" : {
"_type" : "query",
"count" : 3
}
}
The problem with this method is that duplicating the query across each
facet seems both ugly and expensive. Is there some way I can facet
based solely on fields that have hits? I looked at using the terms
filter, but I'm not after a list of the common terms in each field. I
just want to know how many hits in the original query happened in each
set of fields. Is there something in the Query DSL I can use towards
this end?
Thanks!