Hello everyone,
we have a logstash and Elasticsearch-based solution to analyze network protocols, and I am trying to benchmark the whole solution. When I increase the number of Nodes for client/logstash and Elasticsearch, I am not seeing the expected scaling of the performance.
We have two protocols (protocol A and protocol B), there are separate logstash pipelines to process these protocols. To process each protocolweI have two logstash pipelines. i.e total 4 logstash pipelines, each protocol contains 2 logstash pipelines (pipeline 1 and pipeline2).
Pipeline Details:
I am decoding the packets related to the protocols and keeping them in two separate folders. Logstash pipeline-1 is monitoring a folder for new files (ndjson/ each file contains thousands of JSON objects). And for every event in pipeline 1 I am querying an elasticsearch index to get some metadata. I am using ruby and some other filters to process the events as shown below. The processed events are then ingested into an elaticsearch index. Each event contains a session id, and each event is tagged with the corresponding session ID before ingesting, and unique session ids are indexed in a separate index.
Logstash Pipeline 2 is reading from the index where I am storing the session ID’s. and for each session ID/tag, I am getting all the messages belonging to that session (search using tag). I am further stitching all the messages belonging to a particular session and ingesting it into a final index. While forming every session I am again querying an intermediate index for some metadata and after forming the session again I am updating the corresponding document in that index.
Setup Details:
4 machines each containing
- 16 cores
- 30 GB RAM
- SSD NVMe
Note:
-
For all tests the ES JVM heap is configured to 15 GB (50% of total system memory). And logstash JVM heap is configured to 11 GB for each pipeline (a total of 22GB per machine as I am running two pipelines on one machine i.e pipeline 1 and pipeline 2).
-
we are doing a good amount of processing in ruby for each pipeline. Around 200 and 150 lines of ruby code for logstash pipeline 1 and pipeline 2 of Protocol A. Around 500 and 100 lines of ruby code for logstash pipeline 1 and pipeline 2 of Protocol B.
-
At any point in time, logstash pipeline 1 will be writing to the index other than the index that pipeline 2 is reading from. Once Pipeline 2 is done with reading all the data in the index it is working on it will exit. And I clean up all data from the index that pipeline 2 processed and restart pipeline 2 again, and this time it reads from the index that pipeline 1 was writing before. Also this time pipeline 1 writes to the index that pipeline 2 was reading previously. This toggling of the index is done every time after pipeline 2 restarts.
-
I ran several benchmarks by finetuning the workers, batch size and by modifying JVM Heap size for both Elasticsearch and Logstash but didn't see any improvements in the performance.
When I run the performance benchmarking for the solution with 2 nodes (1 for ES and 1 for logstash pipelines), I am getting 6k and 7k TPS for each protocol A and protocol B respectively. But when I benchmark both the protocols together with 4 nodes (2 Node ES Cluster and 1 Node for Protocol A, 1 Node for Protocol B pipelines), I am getting 9.5k combined TPS (4.5k TPS for Protocol A and 5 k for Protocol B).
Am I doing anything wrong which is affecting the performance of logstash/Elasticsearch? IsAre there any other parameres to fintune?
Any suggestions or feedback are appreciated.
Best Regards,
Rakhesh Kumbi