I am currently working on a program and I need to count the number of entries in an index containing some specific fields.
I work in JAVA with a RestHighLevelClient, and I use PutStoredScriptRequest to add my script and SearchTemplateRequest to execute it. My query is in JSON and looks like this:
I have the results I want using response.getHits().getTotalHits(), but I wonder if perhaps there is a more efficient way to achieve that. I work on a very huge index (over 50000 entries) and need the best performances.
For later readers: I found a way to enhance the performances, simply by adopting a low level client.
It offers the same possibilities for my objectives, while being much faster on huge indices.
Sorry for the late answer, I forgot to check the forum after I found my solution.
My new implementation is based on a simple method to perform queries in java:
I found those information in the official documentation here (I'm using elasticsearch version 6.5.0).
The ancient version simply used a RestHighLevelClient and SearchTemplateRequest for querying. I thought it was required because it was already in some classes when I started modifying existing code. Once again, documentation about high level client methods can be found here.
It's a bit vague, but I have a lot of different methods and can't really post in details. I hope it will help, I really learnt everything I know about elasticsearch from scratch using the documentation. I can post more details or answer questions if someone requires it.
Thanks for sharing your code. But I'm still concerned by this:
If a future reader comes to this page he will think that the HLRestClient is slow. Period.
Which is not the case IMHO. It's obviously a bit slower than the LLRestClient because it has to parse the JSON response to create Java beans. If you don't do it with the HLRestClient, you will probably parse the response yourself in your code I guess unless you just send it back to the interface as is.
So I'd like to understand why it was slow in your case. Do you mind sharing what you now have in body and what was the HLRestClient code looking like?
I cannot show the algorithm, it is the property of my company and should remain private.
It is a search engine that looks for documents containing specific keywords. There are several queries to identify keywords given by a user and their synonyms, the body looks like this:
As I mentioned, it is a big index with over 13000 documents, and since the user request itself can be pretty long it requires several queries. The gains of performances were of approximately two seconds for the most complex tests I made.
The previous requests were put on the server before their execution, using two methods:
The first query used to be a template when only keywords were searched. After the addition of synonyms it was no longer possible, because the number of variables would vary from one query to another. However, it is still possible to execute the query without any variable (there is just no parameter), but I didn't know it would impact the performances (since there is no parameter).
I will give it a try, thank you for the precision. I also find the lower level client nicer to use, I end up with less code overall (perhaps because I was using previous methods the wrong way), and I am more familiar with queries in the mustache format.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.