Analyzer plugin needs access to multiple fields


(Andrew Wooster) #1

I have a Analyzer plugin which uses a TokenFilter to annotates tokens. My fields therefore have a) the original text and b) the corresponding annotation map. I am finding it very difficult to bring these two things together in an Analyzer.

Here's what I have tried:

  1. two fields
    I store the text in one field (e.g. text) and the annotation map in another field (e.g. annot).
    THE PROBLEM: An Analyzer only sees the fieldName and the field's token stream. So when analyzing the text field, I have not found a way to access the contents of annot field. Is there a way to access other fields in an Analyzer?

  2. single field with delimiter
    I store the text and annotation map in a single field, separating with a delimiter. My token filter uses the synonym map during the analyses of the text and ensures that the post delimiter annotation map is filtered out.
    THE PROBLEM: While the appended annotation map are not indexed as terms, it still appears in the _source; this means that it can pollute results from the query highlighter.

Any other ideas?


(Ivan Brusic) #2

Have you looked into using Lucene payloads? Elasticsearch has a payload
filter as well.

Ivan


(system) #3