I am new to ELK [ Elasticsearch Logstash and Kibana].
I have three machines having 30 GB RAM and around 400 GB space
I have millions of file to parse [size is around 300 GB]
I have split the files and placed in three systems.
I installed Elasticsearch Logstash and Kibana in one machine . Then installed Logstash in two other machines.
I redirected the output of logstash in each machine to one elasticsearch machine
It took 6 days to filter out the searchd item[I searched for 10 digit number, timestamp and Name of one API from all these logs]and dispay it in Kibana. Am i doing something wrong here. Is there any other way to speed up/tune this process.
Have you identified what the bottleneck is? What is CPU and disk I/O looking like on your Elasticsearch node? How many indices/shards are you actively indexing into?
Not yet identified. CPU utilization was maximum for the corresponding java instance. Currently I have only one elasticsearch instance in one of my machine. Do I need to create 3 instance in 3 machine and start parsing ?
Depending on what you are currently doing, you might be able to tune it and make it more efficient. Scaling up or out is otherwise a good way to increase performance.
It depends on what is limiting performance. I would suspect Elasticsearch to be the bottleneck as you only have one node, so giving this the full host by removing the Logstash instance that is colocated might help.
How many indices and shards are you actively indexing into? The reason I am asking is that this can affect performance as well.
To be honest, I have not specified the number of indices and shards that are actively indexing into. Could you please guide me on exact location where I need to mention the shards and indices. I installed and tried how much time it would take to process 300 GB file. Since 6 days is too much , I am planning to tune so as to get the data in 2 days. So in future I need not wait for 6 days
If you are using the defaults you are probably indexing into a single index, as that is the Logstash default. What is the specification of the server where Elasticsearch is running? What kind of storage do you have? What indexing rate are you seeing? Do you have X-Pack Monitoring installed?
Start by leave one of the machines for Elasticsearch and see if that makes any difference. I would also recommend installing X-Pack on Elasticsearch, Logstash and Kibana, as monitoring gives good insight into what happens both in Elasticsearch and Logstash.
One server machine should have only elasticsearch +xpack in it
Remaining two machine should have logstash+xpack in it directing its output to elasticsearch of main server
Yes, that would be a good start. When you install X-Pack you get an evaluation license which by default enables security, which is why indexing from Logstash stops unless you configure username and password for the Elasticsearch output.
If Elasticsearch is still the bottleneck once you have given it a host of its own, you may need to switch and have 2 Elasticsearch nodes running in a cluster and just one that runs Logstash.
My current setup
Elasticsearch, logstash and Kibana in one main server and logstash in remaning two servers,
I beleive it was taking time while parsing logs in logstash .
My intial plan was to install Elasticsearch, logstash and Kibana in one main server and install Elasticsearch and logstash in remaning two servers. Output of these two elasticsearch will be directed to main Kibana.
As per your input,
One server machine should have only elasticsearch +xpack in it
Remaining two machine should have logstash+xpack in it directing its output to elasticsearch of main server.
If Elasticsearch is still the bottleneck once you have given it a host of its own, you may need to switch and have 2 Elasticsearch nodes running in a cluster and just one that runs Logstash.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.