Hi,
We have developed .NetCore WebApi Application using NEST library for performing CRUD (Create, read, update, delete) Operation in ElasticSearch.
We have setup ElasticSearch with Ingest plug-in on kubernetes cluster with HeapSize 2gb (On cloud).
Goal: Add/Push 100,000 documents (Per Document Size: 13MB to 15MB) in ElasticSearch in 10 - 15 Minutes
Could you please suggest the ideal ElasticSearch configuration or ElasticSearch configuration for high performance for above requirement.
Thanks in Advanced.
Indexing can be CPU intensive and even more so if you are using ingest node. Given the size of your documents and the speed at which you want to index this, the cluster sounds quite small. How many CPU cores do you have? What type of storage?
What level of throughput are you seeing with the current setup? What is limiting performance?
Thanks for your reply,
How many CPU cores do you have?
We deployed on IBM kubernetes cluster and we have 3 worker nodes each node have 8 Cores 32 GB RAM
What type of storage?
We have used IBM storage volume.beta.kubernetes.io/storage-class: ibmc-block-silver
What level of throughput are you seeing with the current setup?
Speed is adding the 1000 document (Document size 13 to 15MB) per hour
What is limiting performance?
it is taking time to adding document, observing the running process.
Please suggest us on high performance kubernetes Elasticsearch configuration.
Thanks for your reply.
My Configuration as below
Please let us know Elasticsearch on Kubernetes configuration for High performance.
I would recommend looking at the following resources:
https://www.elastic.co/guide/en/elasticsearch/reference/6.4/tune-for-indexing-speed.html#tune-for-indexing-speed
https://www.elastic.co/guide/en/elasticsearch/reference/6.4/tune-for-disk-usage.html
Then run tests and try to identify what system resource that is limiting performance, e.g. CPU and/or disk I/O. I generally index a lot smaller documents, so am not sure how to best tune for your particular use-case.
If I am calculating correctly, that is about 1.33TB of raw data. If that is the case you will most likely need a lot larger cluster to be able to ingest that in 15 minutes...