I have a ES cluster (Total 10 nodes) set up as below:
2 - Co-ordinating nodes
3 - Dedicated master only nodes
5 - Data only nodes
we do have multiple logstash pipelines (running at a schedule of every 1 sec) which perform ETL (extract-transform-load) on data events from 2 RDBMS data sources to pull frequent data updates from these DBs to keep data in sync near realtime in ES indexes. I an using Elasticsearch output plugin for sending bulk index requests to ES index.
In the current set up both the REST client (data driven micro-service) and logstash are pointing to co-ordinating only hosts. This set up is in place since couple of years and working optimally without any issues.
Lately, I have been reviewing current set up to optimize cluster as i plan to more workloads, add vector/ semantic search , RAG in my app domain space. Thinking of separating search and indexing requests/workloads.
Jogged around many blogs, read ES docs on best practices and opinions about separating search and indexing requests. So far i had encountered mixed approaches and opinions on sending search queries to co-ordinating only nodes and indexing requests to data only nodes.
Is the ES cluster in risk of overwhelming with routing all search and indexing via client only nodes may be in case of events burst (both search and indexing) ?
To an extent i feel it makes sense to route indexing requests to data only nodes. But would it be a overkill for data nodes to manage communication/ routing indexing request to appropriate data nodes that hold the shards of actual index receiving the indexing request ?
Share your expert opinions.
PS: For now i do not have any ingest pre-processing as logstash filter plugin is doing the job of transforming data records in event to doc i intend to index.