The problem is that inside searchContentHTML can be quite a long text (MBs) that we are basically using for fulltext search only. (Hardly to be usefull for returning to clients)
When the text is about 14MB long the getById query (with _source_excludes=searchContentHTML parameter) takes about 100ms and when the text is small it takes 30ms.
It means it is 3 times longer to simply get one field whenone filed is long..!
Are there any good practices how to handle such document with Elasticsearch?
Currently I see only one option. Move the searchContentHTML field to another index and handle fulltext search a different way - search multiple indices.
I'm getting the same times. It looks ES has troubles to parse such a huge documents..
BTW here is the response I can see in DevTools in Kibana (for the long) document:
Well, in fact in production I'm using getById query, something like this: But it takes approximately the same time. GET quickassets/_doc/-YFy-n8BGruhv47CIXSL?_source_excludes=searchContentHTML or GET quickassets/_doc/-YFy-n8BGruhv47CIXSL?_source_includes=name
It led me to the conclusion that the problem is not in the query itself, but somewhere else..
I tried the profiler, but there is nothing interesting. Most time is spend by build_scorer 23.2µs 48.5%, which makes no sense as the query takes much longer
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.