I'm currently trying to get highlighting features to work in my custom query handler. As far as I can tell, the ES Java API allows highlighting terms of a given query:
My index contains a field (say "tag") that is an identifier that represents a set of related concepts and is assigned by an external document classification engine and another field (say "content"), which is the classified text. A query consists of such a tag and a set of labels that it represents - these are supplied client-side. Now, what I want is to be able to query the index over the "tag" field and highlight the labels sent in the "content" field. Note that using simple FT queries is not an option - the classification engine provides information about the classified documents that is above the scope of FT queries.
An example - document is tagged with "car". An external client sends a request with tag "pet" and labels "cat", "dog", "turtle". What I want is to be able to highlight these in documents tagged with "pet". (This can be done using FT queries - actual cases are more complex)
Solr allows me to do that - its Highlighter can be used on a single text, for example:
I'm honestly not sure what the middle paragraph means but if you want behavior like this and you are writing a highlighting plugin you can certainly implement it - Elasticsearch's highlighters are mostly just rest wrappers around the ones built into Lucene. Mostly.
If you are asking about what you can do with the rest API Elasticsearch supports a highlight_query option that you can specify on the field level and that query is fed to the highlighter instead of the search field. You can't pass arbitrary text to highlight though - its normal to store that text in the _source. You could build it on the fly if you were willing to be creative and write a plugin.
There isn't an api that just highlights stuff like the analyze api just analyes stuff.
I mean the various fulltext query options over the "content" field in my example.
The point is that I want to be able to a query over an unanalyzed string field "tag", get my results and get my labels highlighted in the contents of the "content" field without a separate index query, i.e. I want to highlight only the contents of the documents found by the query over the "tag" field.
There isn't an api that just highlights stuff like the analyze api just analyes stuff.
I take this to mean that I can't highlight a specific document/set of documents, current APIs only work in combination with a query. Is that correct?
They only work with _searching. You could certainly add an id filter and search for just the document that you wanted but even then you'd have to have the document indexed.
Rephrasing to make sure I understand: you want to search in the tag field and for each hit you want highlighting to highlight the content field but only against some terms that you look up from the tag field. So, the source of one document looks like:
{
"tag": {
"pet": ["cats", "dogs", "turtles"]
}
"content": "This pet store has cats, dogs, and turtles"
}
And you want a search for the term pet to highlight the terms cat, dog, and turtle.
There isn't anything built in for that. You'd have to write a plugin or modify an existing one. You can totally do it - I'd have the easiest time doing it by modifying the experimental highlighter, but that is because I maintain that plugin and know where the right hooks are. And its built of pluggable pieces one of which is the terms extraction process.
If you wanted to write a plugin from scratch the way you described in the first post should work - that highlighter is still on the classpath and elasticsearch exposes it as the plain highlighter.
Rephrasing to make sure I understand: you want to search in the tag field and for each hit you want highlighting to highlight the content field but only against some terms that you look up from the tag field. So, the source of one document looks like..
An example document looks like this:
{
"tag":
["pet"],
"content": "This pet store has cats, dogs, and turtles"
}
And the query looks like (the relevant part):
tag=pet&labels=cat,dog,turtle
The point is that the labels are sent from a client app and labels for the same tag can change depending on that app's configuration. Furthermore, the labels are not stored server-side. A query can contain, for example, labels "cat,dog", or "turtle,horse" without any changes being made to the index.
These labels are not search terms (the tag is), I just want to be able to highlight them in search results.
They only work with _searching. You could certainly add an id filter and search for just the document that you wanted but even then you'd have to have the document indexed.
That might just work - the documents are stored and analyzed. Thanks for the idea!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.