Extra filter on returned Facet values


(Bob Sandiford) #1

Hi,

We're currently using Solr, and one of our use cases requires us to
search for documents with a specific field containing (or having fuzzy
matches) the words in the search. We have a facet field based on the
same original value. However, rather than return ALL facets for that
field for the documents that match (the usual situation), we further
filter the returned facets to only those that match the search terms.

So, if we have a source document with an 'Author' source field (for
example), then we have two fields in our index document based on
Author, one is analyzed and searchable, the other is not so that we
just use it for facets.

If our original document had these values for Author:
Smith, Joe
Smith, Fred

and we did a search for 'joe smyth' (so that though it's not an exact
match, there's a fuzzy match), then what we want back in the Author
based facet is just the "Smith, Joe" value (along with the count of
documents in which that facet value appears).

We did this in Solr by modifying the Solr code that puts the facets
together, with a special facet parameter when we want this behaviour.
The code uses a Solr construct called a 'MemoryIndex' - a very fast
means for us to create a in-memory index, add one small document (one
of the Facet values), and run a search against it to see if there's a
match.

Anyways - in order to move to ElasticSearch, I'm needing to know if
there may be some mechanism for achieving the same ultimate result -
i.e. returning only the Facet Values that match the original search.

Ideas, anyone?

Thanks!


(Matt Weber) #2

Take a look at facet filter. I imagine you would do your fuzzy search, then apply a facet filter that does a fuzzy search against just the author field (actually a tokenized version of it) to filter out authors that don't match.

http://www.elasticsearch.org/guide/reference/api/search/facets/index.html

--
Matt Weber
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

On Wednesday, May 16, 2012 at 7:43 AM, Bob Sandiford wrote:

Hi,

We're currently using Solr, and one of our use cases requires us to
search for documents with a specific field containing (or having fuzzy
matches) the words in the search. We have a facet field based on the
same original value. However, rather than return ALL facets for that
field for the documents that match (the usual situation), we further
filter the returned facets to only those that match the search terms.

So, if we have a source document with an 'Author' source field (for
example), then we have two fields in our index document based on
Author, one is analyzed and searchable, the other is not so that we
just use it for facets.

If our original document had these values for Author:
Smith, Joe
Smith, Fred

and we did a search for 'joe smyth' (so that though it's not an exact
match, there's a fuzzy match), then what we want back in the Author
based facet is just the "Smith, Joe" value (along with the count of
documents in which that facet value appears).

We did this in Solr by modifying the Solr code that puts the facets
together, with a special facet parameter when we want this behaviour.
The code uses a Solr construct called a 'MemoryIndex' - a very fast
means for us to create a in-memory index, add one small document (one
of the Facet values), and run a search against it to see if there's a
match.

Anyways - in order to move to ElasticSearch, I'm needing to know if
there may be some mechanism for achieving the same ultimate result -
i.e. returning only the Facet Values that match the original search.

Ideas, anyone?

Thanks!


(rpsandiford) #3

Thanks, Matt.

Hmmm. I'll have to think about that. One thing I forgot to mention is that in this case, we don't actually get / want any actual search results - only the facets. It means we don't need to parse any search results. If I'm understanding your suggestion correctly, I'd need to get back search results with the Author field values that matched the query (i.e. the tokenized version of the author field), and then manually filter the Facet values by those returned values to find the ones I want from the facets. But - that will require ensuring that I get enough results that all the possible author field matches are returned in the result set, or I'll end up removing too many values from the returned facets...

Unless I'm just overthinking this, and the facet filter can actually determine whether or not to return a (non-tokenized) facet value based on a match in a separate (tokenized) field?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | Bob.Sandiford@sirsidynix.commailto:Bob.Sandiford@sirsidynix.com
www.sirsidynix.comhttp://www.sirsidynix.com/

Join the conversation: Like us on Facebook!http://www.facebook.com/SirsiDynix Follow us on Twitter!http://twitter.com/SirsiDynix

From: Matt Weber [via ElasticSearch Users] [mailto:ml-node+s115913n3997396h23@n3.nabble.com]
Sent: Wednesday, May 16, 2012 12:04 PM
To: Bob Sandiford
Subject: Re: Extra filter on returned Facet values

Take a look at facet filter. I imagine you would do your fuzzy search, then apply a facet filter that does a fuzzy search against just the author field (actually a tokenized version of it) to filter out authors that don't match.

http://www.elasticsearch.org/guide/reference/api/search/facets/index.html

--
Matt Weber
Sent with Sparrowhttp://www.sparrowmailapp.com/?sig

On Wednesday, May 16, 2012 at 7:43 AM, Bob Sandiford wrote:
Hi,

We're currently using Solr, and one of our use cases requires us to
search for documents with a specific field containing (or having fuzzy
matches) the words in the search. We have a facet field based on the
same original value. However, rather than return ALL facets for that
field for the documents that match (the usual situation), we further
filter the returned facets to only those that match the search terms.

So, if we have a source document with an 'Author' source field (for
example), then we have two fields in our index document based on
Author, one is analyzed and searchable, the other is not so that we
just use it for facets.

If our original document had these values for Author:
Smith, Joe
Smith, Fred

and we did a search for 'joe smyth' (so that though it's not an exact
match, there's a fuzzy match), then what we want back in the Author
based facet is just the "Smith, Joe" value (along with the count of
documents in which that facet value appears).

We did this in Solr by modifying the Solr code that puts the facets
together, with a special facet parameter when we want this behaviour.
The code uses a Solr construct called a 'MemoryIndex' - a very fast
means for us to create a in-memory index, add one small document (one
of the Facet values), and run a search against it to see if there's a
match.

Anyways - in order to move to ElasticSearch, I'm needing to know if
there may be some mechanism for achieving the same ultimate result -
i.e. returning only the Facet Values that match the original search.

Ideas, anyone?

Thanks!


If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/Extra-filter-on-returned-Facet-values-tp3997253p3997396.html
To start a new topic under ElasticSearch Users, email ml-node+s115913n115913h50@n3.nabble.commailto:ml-node+s115913n115913h50@n3.nabble.com
To unsubscribe from ElasticSearch Users, click herehttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=115913&code=Ym9iLnNhbmRpZm9yZEBzaXJzaWR5bml4LmNvbXwxMTU5MTN8LTIxMTYxMTI0NTQ=.
NAMLhttp://elasticsearch-users.115913.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html!nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers!nabble%3Aemail.naml-instant_emails!nabble%3Aemail.naml-send_instant_email!nabble%3Aemail.naml


(system) #4