I have a field called abstract that has numerous words per document, as you
can imagine. I would like to use this field as a dictionary of sorts for
autocompletion of individual search terms. I have done some research on the
autocomplete solutions out there and it seems that they all work for fields
that have single or few terms per document, but not for my case where the
number of terms per document can be in the hundreds.
I tried to do a wildcard on an edgengram field in combination with
highlighting, but that just gives the top X docs that match that. So I will
get 1,000s of docs that match "wireless", then 1,000s of docs that match
"wired", etc. Not gonna work.
I also tried faceting on a tokenized field, but of course I get all of the
popular terms in the facet as opposed to the terms that match my query. I
tried the facet filter, but that only filters the docs that the facet
matches against, still returning all of the most popular terms. I end up
with facets like "a", "the", "from", "includes".
So I am thinking what would be ideal for my case is to be able to filter
the facet terms using a wildcard, as opposed to the docs. So far I have
not discovered a way to do this. Is this possible with elasticsearch out of
the box? Is there another, better solution for my problem?
I have a field called abstract that has numerous words per document, as you can imagine. I would like to use this field as a dictionary of sorts for autocompletion of individual search terms. I have done some research on the autocomplete solutions out there and it seems that they all work for fields that have single or few terms per document, but not for my case where the number of terms per document can be in the hundreds.
I tried to do a wildcard on an edgengram field in combination with highlighting, but that just gives the top X docs that match that. So I will get 1,000s of docs that match "wireless", then 1,000s of docs that match "wired", etc. Not gonna work.
I also tried faceting on a tokenized field, but of course I get all of the popular terms in the facet as opposed to the terms that match my query. I tried the facet filter, but that only filters the docs that the facet matches against, still returning all of the most popular terms. I end up with facets like "a", "the", "from", "includes".
So I am thinking what would be ideal for my case is to be able to filter the facet terms using a wildcard, as opposed to the docs. So far I have not discovered a way to do this. Is this possible with elasticsearch out of the box? Is there another, better solution for my problem?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 23 août 2012 à 08:32, Mark Waddle <mark@markwaddle.com<javascript:_e({}, 'cvml', 'mark@markwaddle.com');>>
a écrit :
I have a field called abstract that has numerous words per document, as
you can imagine. I would like to use this field as a dictionary of sorts
for autocompletion of individual search terms. I have done some research on
the autocomplete solutions out there and it seems that they all work for
fields that have single or few terms per document, but not for my case
where the number of terms per document can be in the hundreds.
I tried to do a wildcard on an edgengram field in combination with
highlighting, but that just gives the top X docs that match that. So I will
get 1,000s of docs that match "wireless", then 1,000s of docs that match
"wired", etc. Not gonna work.
I also tried faceting on a tokenized field, but of course I get all of the
popular terms in the facet as opposed to the terms that match my query. I
tried the facet filter, but that only filters the docs that the facet
matches against, still returning all of the most popular terms. I end up
with facets like "a", "the", "from", "includes".
So I am thinking what would be ideal for my case is to be able to filter
the facet terms using a wildcard, as opposed to the docs. So far I have
not discovered a way to do this. Is this possible with elasticsearch out of
the box? Is there another, better solution for my problem?
So I am thinking what would be ideal for my case is to be able to
filter the facet terms using a wildcard, as opposed to the docs. So
far I have not discovered a way to do this. Is this possible with
elasticsearch out of the box? Is there another, better solution for my
problem?
A gist showing actual data demonstrating what you are currently doing
and what you would like to achieve would make this easier to follow.
But based on your last paragraph, you should be able to do that using
regex patterns in your terms facet.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.