Performance scaling for Logstash and Elasticsearch based solution

Hello everyone,

we have a logstash and Elasticsearch-based solution to analyze network protocols, and I am trying to benchmark the whole solution. When I increase the number of Nodes for client/logstash and Elasticsearch, I am not seeing the expected scaling of the performance.

We have two protocols (protocol A and protocol B), there are separate logstash pipelines to process these protocols. To process each protocolweI have two logstash pipelines. i.e total 4 logstash pipelines, each protocol contains 2 logstash pipelines (pipeline 1 and pipeline2).

Pipeline Details:
I am decoding the packets related to the protocols and keeping them in two separate folders. Logstash pipeline-1 is monitoring a folder for new files (ndjson/ each file contains thousands of JSON objects). And for every event in pipeline 1 I am querying an elasticsearch index to get some metadata. I am using ruby and some other filters to process the events as shown below. The processed events are then ingested into an elaticsearch index. Each event contains a session id, and each event is tagged with the corresponding session ID before ingesting, and unique session ids are indexed in a separate index.

Logstash Pipeline 2 is reading from the index where I am storing the session ID’s. and for each session ID/tag, I am getting all the messages belonging to that session (search using tag). I am further stitching all the messages belonging to a particular session and ingesting it into a final index. While forming every session I am again querying an intermediate index for some metadata and after forming the session again I am updating the corresponding document in that index.

Setup Details:
4 machines each containing

  • 16 cores
  • 30 GB RAM
  • SSD NVMe

Note:

  • For all tests the ES JVM heap is configured to 15 GB (50% of total system memory). And logstash JVM heap is configured to 11 GB for each pipeline (a total of 22GB per machine as I am running two pipelines on one machine i.e pipeline 1 and pipeline 2).

  • we are doing a good amount of processing in ruby for each pipeline. Around 200 and 150 lines of ruby code for logstash pipeline 1 and pipeline 2 of Protocol A. Around 500 and 100 lines of ruby code for logstash pipeline 1 and pipeline 2 of Protocol B.

  • At any point in time, logstash pipeline 1 will be writing to the index other than the index that pipeline 2 is reading from. Once Pipeline 2 is done with reading all the data in the index it is working on it will exit. And I clean up all data from the index that pipeline 2 processed and restart pipeline 2 again, and this time it reads from the index that pipeline 1 was writing before. Also this time pipeline 1 writes to the index that pipeline 2 was reading previously. This toggling of the index is done every time after pipeline 2 restarts.

  • I ran several benchmarks by finetuning the workers, batch size and by modifying JVM Heap size for both Elasticsearch and Logstash but didn't see any improvements in the performance.

When I run the performance benchmarking for the solution with 2 nodes (1 for ES and 1 for logstash pipelines), I am getting 6k and 7k TPS for each protocol A and protocol B respectively. But when I benchmark both the protocols together with 4 nodes (2 Node ES Cluster and 1 Node for Protocol A, 1 Node for Protocol B pipelines), I am getting 9.5k combined TPS (4.5k TPS for Protocol A and 5 k for Protocol B).

Am I doing anything wrong which is affecting the performance of logstash/Elasticsearch? IsAre there any other parameres to fintune?

Any suggestions or feedback are appreciated.

Best Regards,
Rakhesh Kumbi

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.