Hello all, I was wondering if anyone could offer some feedback on whether
there is a way to determine how a document matched in real time. I
currently use custom analyzers at index time to allow a broad array of
matches for a given text field. I try to match based on phrases, synonyms,
substrings, stemming, etc of a given phrase, and I would like to be able to
figure out at search time, which analyzer was attributed to causing the
match.
Currently, I've gotten around this by creating child documents where the
fields are fanned out to their respective analyzer types. So I have a child
document where the field only applies stemming, another that uses only
synonyms, etc. However, due to the growing number of fields that require
analysis and the growth of my data set, I'd much prefer if I had less
documents (and less complex too). I was hoping there would be a way to tag
tokens at the analysis phase that could be used at the search phase to
quickly determine my match level, but I was not able to find anything like
this.
Having said that, has anyone else ever tried to figure this out, or have an
thoughts on how to leverage ES at a lower level to determine match?
Just a friendly bump to see if anyone has any feedback.
On Saturday, January 10, 2015 at 10:38:34 PM UTC-8, Ed Kim wrote:
Hello all, I was wondering if anyone could offer some feedback on whether
there is a way to determine how a document matched in real time. I
currently use custom analyzers at index time to allow a broad array of
matches for a given text field. I try to match based on phrases, synonyms,
substrings, stemming, etc of a given phrase, and I would like to be able to
figure out at search time, which analyzer was attributed to causing the
match.
Currently, I've gotten around this by creating child documents where the
fields are fanned out to their respective analyzer types. So I have a child
document where the field only applies stemming, another that uses only
synonyms, etc. However, due to the growing number of fields that require
analysis and the growth of my data set, I'd much prefer if I had less
documents (and less complex too). I was hoping there would be a way to tag
tokens at the analysis phase that could be used at the search phase to
quickly determine my match level, but I was not able to find anything like
this.
Having said that, has anyone else ever tried to figure this out, or have
an thoughts on how to leverage ES at a lower level to determine match?
Just a friendly bump to see if anyone has any feedback.
On Saturday, January 10, 2015 at 10:38:34 PM UTC-8, Ed Kim wrote:
Hello all, I was wondering if anyone could offer some feedback on whether
there is a way to determine how a document matched in real time. I
currently use custom analyzers at index time to allow a broad array of
matches for a given text field. I try to match based on phrases, synonyms,
substrings, stemming, etc of a given phrase, and I would like to be able to
figure out at search time, which analyzer was attributed to causing the
match.
Currently, I've gotten around this by creating child documents where the
fields are fanned out to their respective analyzer types. So I have a child
document where the field only applies stemming, another that uses only
synonyms, etc. However, due to the growing number of fields that require
analysis and the growth of my data set, I'd much prefer if I had less
documents (and less complex too). I was hoping there would be a way to tag
tokens at the analysis phase that could be used at the search phase to
quickly determine my match level, but I was not able to find anything like
this.
Having said that, has anyone else ever tried to figure this out, or have
an thoughts on how to leverage ES at a lower level to determine match?
I was able to identify which field matched via explain, but couldn't see
any information on which token filter was the reason for the match. I've
tried specifying the analyzer name that the field uses as well as not
specifying. If the explain is supposed to provide this data, I will give it
another go and set up a test index with simpler analyzer setups.
Also, in order to do this, I will need to run the explain separate from the
search itself. My ultimate goal is to be able to do this within
milliseconds (less than 10). Is this feasible with explain?
On Wednesday, January 14, 2015 at 12:51:15 PM UTC-8, Nikolas Everett wrote:
What about explain?
On Wed, Jan 14, 2015 at 3:24 PM, Ed Kim <edk...@gmail.com <javascript:>>
wrote:
Just a friendly bump to see if anyone has any feedback.
On Saturday, January 10, 2015 at 10:38:34 PM UTC-8, Ed Kim wrote:
Hello all, I was wondering if anyone could offer some feedback on
whether there is a way to determine how a document matched in real time. I
currently use custom analyzers at index time to allow a broad array of
matches for a given text field. I try to match based on phrases, synonyms,
substrings, stemming, etc of a given phrase, and I would like to be able to
figure out at search time, which analyzer was attributed to causing the
match.
Currently, I've gotten around this by creating child documents where the
fields are fanned out to their respective analyzer types. So I have a child
document where the field only applies stemming, another that uses only
synonyms, etc. However, due to the growing number of fields that require
analysis and the growth of my data set, I'd much prefer if I had less
documents (and less complex too). I was hoping there would be a way to tag
tokens at the analysis phase that could be used at the search phase to
quickly determine my match level, but I was not able to find anything like
this.
Having said that, has anyone else ever tried to figure this out, or have
an thoughts on how to leverage ES at a lower level to determine match?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.