Multi-Field and Highlighting

Hello,

one more specific question in my quest to let ElasticSearch do what I want
it to :wink:

I have some text fields which have been thoroughly analyzed before any
indexing happens. Thus, I already know the text to be stored as well as the
tokens the text should produce. I don't want to use a Lucene/ElasticSearch
analyzer in that case because it would not be able to produce the tokens I
want.
Essentially I'm looking for the PreAnalyzedField feature available in Solr.
If there is such a feature and I just missed it, you can just tell me and
skip the rest of this post :wink:

For ElasticSearch I thought I would exploit the multi_field feature by
doing the following:

  "properties": {
    "text_stored": {
      "type": "multi_field",
      "path": "just_name",
      "fields": {
        "text": {"type": "string","index": "no","store":"yes"}
      }
    },
    "text_analyzed": {
      "type": "multi_field",
      "path": "just_name",
      "fields": {
        "text": {"type": "string","index": "analyzed","term_vector" : "with_positions_offsets"}
      }
    }

The whole example can be found here and be copy&pasted into the terminal
after starting a fresh copy of ElasticSearch:

"text_stored" should contain the original text and "text_analyzed" my
pre-analyzed terms (I could add those by using an appropriate tokenizer
plugin I hope).

If I now have a document like this

{
"text_stored": "Sebastien Lorber is awesome. Yes, old Lorber.",
"text_analyzed": "Lorber has a farm."
}'

I am able to find the document by searching "text:farm" for example.
Searching for "text:awesome" would not work here, of course, because the
"text_stored" field is not analyzed.

The only thing lacking for me now is that the "text_stored" field value
should be highlighted corresponding to the analyzed tokens in
"text_analyzed".
Thus, when searching for "lorber" I would like this highlighting:
"Sebastien Lorber is awesome. Yes, old Lorber."
Instead I get
"Sebastien Lorber is awesome. Yes, old Lorber."

When searching for "farm" I'd like highlighting to be as

"Sebastien Lorber is awesome. Yes, old Lorber."
Instead I don't get any highlighting because "farm" is not found in the text.

I know that this behaviour makes sense for the default use case.
I hoped by specifying
"term_vector" : "with_positions_offsets" highlighting would only happen based on offsets, ignoring actual text contents.
My question is whether there is a possibility to get the behaviour I'd like to see.

Thank you!

Erik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.