For example I tried indexing 31000 documents in a java test that pulls in parallel with decanter logs to elasticsearch and I see that when finishing the elasticsearch test it continues indexing like 20 minutes later and it doesn't index me almost in real time as I expected, and I don't know what it is the problem, is it due to disk speed or do I have to have more hadware.
That's not how it works. As soon as your client has received a 200 response from Elasticsearch, then it's been indexed. What's happening after that might be merging, but it's hard to know without more information on what you are seeing.
The first thing to check is that you have followed these instructions, especially about using bulk requests (especially since you are using HDD storage). Elasticsearch is often limited by disk utilization as indexing and associated merging can be very I/O intensive.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.