I have a Analyzer plugin which uses a TokenFilter to annotates tokens. My fields therefore have a) the original text and b) the corresponding annotation map. I am finding it very difficult to bring these two things together in an Analyzer.
Here's what I have tried:
-
two fields
I store the text in one field (e.g. text) and the annotation map in another field (e.g. annot).
THE PROBLEM: An Analyzer only sees the fieldName and the field's token stream. So when analyzing the text field, I have not found a way to access the contents of annot field. Is there a way to access other fields in an Analyzer? -
single field with delimiter
I store the text and annotation map in a single field, separating with a delimiter. My token filter uses the synonym map during the analyses of the text and ensures that the post delimiter annotation map is filtered out.
THE PROBLEM: While the appended annotation map are not indexed as terms, it still appears in the _source; this means that it can pollute results from the query highlighter.
Any other ideas?