Regarding Cluster Sizing

If I deploy a topology that separates ingest nodes and data nodes, all things being equal, how many data nodes can be served by a single ingest node without the heap of the ingest node becoming a bottleneck?


This is not related. The heap used by ingest nodes is basically used to hold the bulk requests (let say 10000 documents) and the memory needed to process the data.

Even if you have processed in the past petabytes of data, it won't require more memory on the ingest node.

Hi @dadoonet, That's not what I meant. I'm referring to the processing of the bulk requests. We have up to 20 data nodes in various clusters to process these requests right now. If we move the ingest nodes off to another tier all of those bulks are going to have to transit the heap of the ingest nodes...

I do not understand this reasoning at all.

If you create one or more decicated ingest nodes that do not hold data, these will only hold the data while it is being processed. Once the pipeline has processed the data it will be sent on to the appropriate data nodes for indexing. The data nodes require a good amount of heap to hold the data, but a dedicated ingest node can often get away with a lot less heap than the data nodes. Exactly how much you will need depends on the indexing throughput and the size and complexity of data and pipelines.

It is worth noting that you are referring to multiple clusters. Each ingest node need to be part of one cluster and can only handle data going into that cluster. You will therefore need at least one ingest node per cluster, which may not be economical if your clusters are very small.

Well, I tried this before on a cluster ingesting 50k msg/sec and the dedicated ingest nodes were a significant bottleneck. I was advised on this forum not to do that and so now have the ingestion pipelines on the data nodes.

Note that these ingest nodes were repurposed coordinator nodes, three of them, which seems fair enough to me. What I saw was that the ingest nodes could not process and forward requests fast enough and were blowing their heap (4GB).

The heap size required for dedicated ingest nodes depend on the indexing throughput, document size and processing complexity. I would expect ingest nodes to have a very different resource usage compared to dedicated coordinating nodes as they do very different work.

You need to test and see what works best for your particular use case.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.