Hi guys. I really am trying to understand Elasticsearch. The code that I have written creates 35 parallel python scripts that all try to create indices on ES cluster simultaneously. But this is happening in serial and I am unable to implement parallelism to make my code run faster.
Is this because ES does not allow multiple scripts to index at once?
Indexing can be done in parallel by multiple threads, but creating indices have to update the cluster state and therefore need to be serialised through the master node.
Creating indices is usually not done very frequently, so the fact that the cluster state need to be updated is generally not a problem. What exactly is it you are trying to achieve?
I do not have the need to store the data on the node. So I am creating indices of size 10k, performing the search on them and then deleting it before I create another index. Since match query is not able to give me all the results of the search query, I am creating the index on 10k records and retrieving all the results before I move on to create another index.
For example, if the data size is 20k, I am creating 2 indices through a for-loop. Deleting the first one after the search is done
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.