I have written an Elasticsearch plugin ( analyzes the UserAgent string ) that works the way I have in mind when I put it in a pipeline. (see Elastic Search | Yauaa - Yet Another UserAgent Analyzer )
Now on my laptop I want to put a rather large dataset (~100M records) in Elasticsearch via this plugin and check the results in Kibana.
I have put together some scripting ( yauaa/devtools/analysis at main · nielsbasjes/yauaa · GitHub ) that starts Elasticsearch with the plugin installed using Docker on my Ubuntu machine.
I then define the pipeline and load the data.
Functionally this works.
The problem I have is that when I do this I have been unable to make this pipeline run in multiple threads and thus reduce the time I have to wait. At the moment it is only using ~2-3 cpu cores where my laptop has 12 (6+hyperthreading).
I am doing 8 bulk updates at a time with ~ 100000 records in each batch.
What config setting can I change to get ES to actually use multiple threads for this pipeline so that it uses ~10 CPU cores?