Filter the facet terms?


(Mark Waddle) #1

I have a field called abstract that has numerous words per document, as you
can imagine. I would like to use this field as a dictionary of sorts for
autocompletion of individual search terms. I have done some research on the
autocomplete solutions out there and it seems that they all work for fields
that have single or few terms per document, but not for my case where the
number of terms per document can be in the hundreds.

I tried to do a wildcard on an edgengram field in combination with
highlighting, but that just gives the top X docs that match that. So I will
get 1,000s of docs that match "wireless", then 1,000s of docs that match
"wired", etc. Not gonna work.

I also tried faceting on a tokenized field, but of course I get all of the
popular terms in the facet as opposed to the terms that match my query. I
tried the facet filter, but that only filters the docs that the facet
matches against, still returning all of the most popular terms. I end up
with facets like "a", "the", "from", "includes".

So I am thinking what would be ideal for my case is to be able to filter
the facet terms using a wildcard, as opposed to the docs. So far I have
not discovered a way to do this. Is this possible with elasticsearch out of
the box? Is there another, better solution for my problem?

Thanks for your help!
Mark

--


(David Pilato) #2

Wildcard is not available as a filter.
Wildcard is slow.

I would define my own analyzer for that field with edgengram tokenizer (http://www.elasticsearch.org/guide/reference/index-modules/analysis/edgengram-tokenizer.html) and an english analyzer.
Then, a Term facet on that field could give you the first 10 terms.

My 2 cents.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 23 août 2012 à 08:32, Mark Waddle mark@markwaddle.com a écrit :

I have a field called abstract that has numerous words per document, as you can imagine. I would like to use this field as a dictionary of sorts for autocompletion of individual search terms. I have done some research on the autocomplete solutions out there and it seems that they all work for fields that have single or few terms per document, but not for my case where the number of terms per document can be in the hundreds.

I tried to do a wildcard on an edgengram field in combination with highlighting, but that just gives the top X docs that match that. So I will get 1,000s of docs that match "wireless", then 1,000s of docs that match "wired", etc. Not gonna work.

I also tried faceting on a tokenized field, but of course I get all of the popular terms in the facet as opposed to the terms that match my query. I tried the facet filter, but that only filters the docs that the facet matches against, still returning all of the most popular terms. I end up with facets like "a", "the", "from", "includes".

So I am thinking what would be ideal for my case is to be able to filter the facet terms using a wildcard, as opposed to the docs. So far I have not discovered a way to do this. Is this possible with elasticsearch out of the box? Is there another, better solution for my problem?

Thanks for your help!
Mark

--


(Mark Waddle) #3

Thanks David. Unfortunately I've already tried that. The first ten terms in
my abstract field end up being "the", "a", etc.

What I need are the top 10 terms in field Y that start with X. Hence why I
was thinking that a filter on the terms would be good.

Any other ideas?

On Thursday, August 23, 2012, David Pilato wrote:

Wildcard is not available as a filter.
Wildcard is slow.

I would define my own analyzer for that field with edgengram tokenizer (
http://www.elasticsearch.org/guide/reference/index-modules/analysis/edgengram-tokenizer.html)
and an english analyzer.
Then, a Term facet on that field could give you the first 10 terms.

My 2 cents.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 23 août 2012 à 08:32, Mark Waddle <mark@markwaddle.com<javascript:_e({}, 'cvml', 'mark@markwaddle.com');>>
a écrit :

I have a field called abstract that has numerous words per document, as
you can imagine. I would like to use this field as a dictionary of sorts
for autocompletion of individual search terms. I have done some research on
the autocomplete solutions out there and it seems that they all work for
fields that have single or few terms per document, but not for my case
where the number of terms per document can be in the hundreds.

I tried to do a wildcard on an edgengram field in combination with
highlighting, but that just gives the top X docs that match that. So I will
get 1,000s of docs that match "wireless", then 1,000s of docs that match
"wired", etc. Not gonna work.

I also tried faceting on a tokenized field, but of course I get all of the
popular terms in the facet as opposed to the terms that match my query. I
tried the facet filter, but that only filters the docs that the facet
matches against, still returning all of the most popular terms. I end up
with facets like "a", "the", "from", "includes".

So I am thinking what would be ideal for my case is to be able to filter
the facet terms using a wildcard, as opposed to the docs. So far I have
not discovered a way to do this. Is this possible with elasticsearch out of
the box? Is there another, better solution for my problem?

Thanks for your help!
Mark

--

--

--

  • Mark

--


(Clinton Gormley) #4

Hi Mark

So I am thinking what would be ideal for my case is to be able to
filter the facet terms using a wildcard, as opposed to the docs. So
far I have not discovered a way to do this. Is this possible with
elasticsearch out of the box? Is there another, better solution for my
problem?

A gist showing actual data demonstrating what you are currently doing
and what you would like to achieve would make this easier to follow.

But based on your last paragraph, you should be able to do that using
regex patterns in your terms facet.

See Regex Patterns on:
http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html

clint

--


(system) #5