Can I use ES highlight features on a specific document/text?


(Maciej Buczek) #1

I'm currently trying to get highlighting features to work in my custom query handler. As far as I can tell, the ES Java API allows highlighting terms of a given query:

   SearchResponse search = client().prepareSearch().setQuery(constantScoreQuery(matchQuery("text", "text"))).addHighlightedField(new Field("*").highlighterType(highlighter)).get();

My index contains a field (say "tag") that is an identifier that represents a set of related concepts and is assigned by an external document classification engine and another field (say "content"), which is the classified text. A query consists of such a tag and a set of labels that it represents - these are supplied client-side. Now, what I want is to be able to query the index over the "tag" field and highlight the labels sent in the "content" field. Note that using simple FT queries is not an option - the classification engine provides information about the classified documents that is above the scope of FT queries.
An example - document is tagged with "car". An external client sends a request with tag "pet" and labels "cat", "dog", "turtle". What I want is to be able to highlight these in documents tagged with "pet". (This can be done using FT queries - actual cases are more complex)

Solr allows me to do that - its Highlighter can be used on a single text, for example:

private String getHighlighting(List<String> labels,
            String content, Analyzer textAnalyzer) {
        BooleanQuery highQuery = new BooleanQuery();

        for (String label : labelList) {
            WildcardQuery wq = new WildcardQuery(new Term("content",
                    label.toLowerCase()));
            highQuery.add(wq, Occur.MUST);
        }
        
        Highlighter high = new Highlighter(
                new QueryScorer(highQuery, "content"));

        String highlights = null;
        try {
            highlights = high.getBestFragment(textAnalyzer, "content", content);
            } 
        catch (InvalidTokenOffsetsException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return highlights;
    }

Can I do the same - or equivalent - in ES?


(Nik Everett) #2

You may want to define that term.

I'm honestly not sure what the middle paragraph means but if you want behavior like this and you are writing a highlighting plugin you can certainly implement it - Elasticsearch's highlighters are mostly just rest wrappers around the ones built into Lucene. Mostly.

If you are asking about what you can do with the rest API Elasticsearch supports a highlight_query option that you can specify on the field level and that query is fed to the highlighter instead of the search field. You can't pass arbitrary text to highlight though - its normal to store that text in the _source. You could build it on the fly if you were willing to be creative and write a plugin.

There isn't an api that just highlights stuff like the analyze api just analyes stuff.


(Maciej Buczek) #3

Thanks for the reply.

You may want to define that term.

I mean the various fulltext query options over the "content" field in my example.
The point is that I want to be able to a query over an unanalyzed string field "tag", get my results and get my labels highlighted in the contents of the "content" field without a separate index query, i.e. I want to highlight only the contents of the documents found by the query over the "tag" field.

There isn't an api that just highlights stuff like the analyze api just analyes stuff.

I take this to mean that I can't highlight a specific document/set of documents, current APIs only work in combination with a query. Is that correct?


(Nik Everett) #4

They only work with _searching. You could certainly add an id filter and search for just the document that you wanted but even then you'd have to have the document indexed.

Rephrasing to make sure I understand: you want to search in the tag field and for each hit you want highlighting to highlight the content field but only against some terms that you look up from the tag field. So, the source of one document looks like:

{
  "tag": {
    "pet": ["cats", "dogs", "turtles"]
  }
  "content": "This pet store has cats, dogs, and turtles"
}

And you want a search for the term pet to highlight the terms cat, dog, and turtle.

There isn't anything built in for that. You'd have to write a plugin or modify an existing one. You can totally do it - I'd have the easiest time doing it by modifying the experimental highlighter, but that is because I maintain that plugin and know where the right hooks are. And its built of pluggable pieces one of which is the terms extraction process.

If you wanted to write a plugin from scratch the way you described in the first post should work - that highlighter is still on the classpath and elasticsearch exposes it as the plain highlighter.


(Maciej Buczek) #5

Rephrasing to make sure I understand: you want to search in the tag field and for each hit you want highlighting to highlight the content field but only against some terms that you look up from the tag field. So, the source of one document looks like..

An example document looks like this:

{
  "tag": 
     ["pet"],
  
  "content": "This pet store has cats, dogs, and turtles"
}

And the query looks like (the relevant part):

tag=pet&labels=cat,dog,turtle

The point is that the labels are sent from a client app and labels for the same tag can change depending on that app's configuration. Furthermore, the labels are not stored server-side. A query can contain, for example, labels "cat,dog", or "turtle,horse" without any changes being made to the index.
These labels are not search terms (the tag is), I just want to be able to highlight them in search results.

They only work with _searching. You could certainly add an id filter and search for just the document that you wanted but even then you'd have to have the document indexed.

That might just work - the documents are stored and analyzed. Thanks for the idea!


(system) #6