Highlighting is extremely slow on concurrent requests

We are running some performance tests on our queries to elasticsearch. We are using simple_query_string query type and we are expecting the results to be highlighted.

Our initial thoughts were that the transport client was throttling the requests somehow (discussed here: Elasticsearch Transport Client bottle neck with concurrent calls), but, after playing around with the queries a bit more I realized the bottleneck was not the client but the highlight query itself.

Here is the query that we turn on and off to get the performance numbers :

HighlightBuilder highlightBuilder = new HighlightBuilder().field("*").requireFieldMatch(false);
            builder = builder.setQuery(forcedFilterByCompanyIdsBuilder)
                .highlighter(highlightBuilder);

Our approximate dataset is around 100,000 documents. Each of size no more than 2K.
Here's our performance tests (1 ES Node, 1 index):

with highlighter:

without highlighter:

I followed up on a lot of different threads almost all mentioning highlighters being slow overall. I also tried FVH but there wasn't much difference under high load.

So I was wondering

  • Is there no way highlighters can be faster ?
  • Are any performance metrics published for usage of highlighters?
  • We are expecting atleast 200 TPS from each ES node. Given that with highlighters we can't get past 50, do we need to not use highlighters at all? (If so, is there a way we can retrieve the "_all" field and highlight by ourself? I tried changing the "_all" to be "Stored: true" but I still can't retrieve it through java api. Any thoughts on how that can be done?)
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.