Hello, I have a plan to use ingest pipeline on my cluster,
the plan is:
90 application will hit to elastic server, so I'll make around 200 pipeline on my cluster
How can I scale the ingest node I need on my cluster to make this possible
But more importantly that how many pipelines is how complex they are and what is throughput in terms of Events / Min etc.
The number and size of the ingest nodes will depend on that. Also unless otherwise specified your other nodes probably are ingest nodes by default, if you want to isolate ingest processing to only the dedicated ingest only nodes you will need to remove the ingest role from the other nodes.
Ingest node do not need much disk but CPU and RAM are good. Depending on your ingest needs will help understand what you needs are.
If you provide a little perspective perhaps we could suggest a starting point.
hello @stephenb, I'm on self managed server, if I can explain the complexity of pipeline, maybe each pipeline checks on document field to get different index output,
on example, if field == A, indexname == A, and the event throughput is about 1500 events / Min per pipeline
for additional information, I'm running my cluster on VM
my cluster have
1 load-balance node (this node now also act as ingest node)
1 load-balance node (not act as ingest node)
1 master node
4 data node
yes I guess it is coordinator node,
so is it fine 1 ingest node handling 1500 events / min * 200 pipeline? does this mean my node can handle around 30k events / min total?
First I miss-understood I thought you meant 1500 EPM Total, but really you meant about 30K EPM Total, that should still be fine as that is really only about 500 Events / Sec... you should probably be able to scale a 4 CPU 8 GB RAM Ingest node to about 7-10x that ... about 3500 - 5000 Events / Sec for a low / medium level pipeline complexity. This is just an estimate.
Basically if it becomes CPU or RAM bound you can increase those... up to about what I said above then start scaling horizontally.
Again without close inspection these are just guesses... if you pipelines are not complex you might get more or more complex you might get less throughput.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.