Separation of read and write indices

Hi All,
We are using elasticsearch 5.3.0 with a big machine(128RAM, 40cores) and index about 500GB documents every day. We are thinking about improving the search and index performance by separation of read and write indices. Is it possible to deploy multiple Elasticsearch nodes in a single machine and these nodes point on same data location. One node is only for index and one node is only for search(disable index and merge on this node), so that searching and indexing will work separately and not interfere with each other.

Yes you can.

That you cannot do.

What sort of data are you storing in Elasticsearch?

Thank you for the quick reply. We are storing time-based net-work data like {'ts': 1510125233, 'pktlen': 1000, 'ipsrc': '', 'ipdest': ''}

Then you should look to index the data from today onto one node, then move the index to the other day when it rolls over and then read from that.

That is, use hot/cold architecture.

Is this move operation like a move file operation in operating system? Will it affect index performance?

How long are you keeping data? How long time periods do you usually query/aggregate across? Do you query/aggregate across the most recent data more frequently than older data?

We keep data for 1-2 weeks. Usually across hours, some queries may also across 1-2 day(not a lot). Yes, more frequently than older data.

What type of storage do you have? SSDs?

No, not SSD. Just normal hard disk

In that case it sounds like disk I/O might become the bottleneck, and I am not sure how well it will cope with that amount of indexing and querying. As you will be querying the indices that you are actively indexing into more frequently than older data, I do not necessarily see any point in trying to split indexing and querying into separate nodes. Instead I would recommend trying to spread the indexing load, which is quite I/O intensive, out across all available disks. Having 2 nodes on the server may still be useful as it will give you access to more heap space.

Thank you for the replay. I am now testing performance with two nodes, hope it can improve compared with our original 32G heap size node. And what's the appropriate heap size for each node if Ihave 2 or 4 node in my machince(128RAM 40cores)

2 nodes with 31GB heap is probably ideal. I do not see any point in having more nodes than that on that kind of server.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.