Implementing ES index creation and search in parallel on the same node

(Bhargavi Sri) #1

Hi guys. I really am trying to understand Elasticsearch. The code that I have written creates 35 parallel python scripts that all try to create indices on ES cluster simultaneously. But this is happening in serial and I am unable to implement parallelism to make my code run faster.

Is this because ES does not allow multiple scripts to index at once?

(Christian Dahlqvist) #2

Indexing can be done in parallel by multiple threads, but creating indices have to update the cluster state and therefore need to be serialised through the master node.

(Bhargavi Sri) #3

But in my code I have the creation of index and searching linked together. So is there nothing I can do in this case?

(Christian Dahlqvist) #4

Creating indices is usually not done very frequently, so the fact that the cluster state need to be updated is generally not a problem. What exactly is it you are trying to achieve?

(Bhargavi Sri) #5

I do not have the need to store the data on the node. So I am creating indices of size 10k, performing the search on them and then deleting it before I create another index. Since match query is not able to give me all the results of the search query, I am creating the index on 10k records and retrieving all the results before I move on to create another index.

For example, if the data size is 20k, I am creating 2 indices through a for-loop. Deleting the first one after the search is done

(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.