ELK - Time taking to parse

paulachinthpaul · July 27, 2017, 4:46am

I am new to ELK [ Elasticsearch Logstash and Kibana].

I have three machines having 30 GB RAM and around 400 GB space
I have millions of file to parse [size is around 300 GB]
I have split the files and placed in three systems.
I installed Elasticsearch Logstash and Kibana in one machine . Then installed Logstash in two other machines.
I redirected the output of logstash in each machine to one elasticsearch machine

It took 6 days to filter out the searchd item[I searched for 10 digit number, timestamp and Name of one API from all these logs]and dispay it in Kibana. Am i doing something wrong here. Is there any other way to speed up/tune this process.

Thanks in Advance, Paul

Christian_Dahlqvist · July 27, 2017, 6:36am

Have you identified what the bottleneck is? What is CPU and disk I/O looking like on your Elasticsearch node? How many indices/shards are you actively indexing into?

paulachinthpaul · July 27, 2017, 6:53am

Not yet identified. CPU utilization was maximum for the corresponding java instance. Currently I have only one elasticsearch instance in one of my machine. Do I need to create 3 instance in 3 machine and start parsing ?

warkolm · July 27, 2017, 8:13am

FYI we’ve renamed ELK to the Elastic Stack, otherwise beats feels left out

Christian_Dahlqvist · July 27, 2017, 8:22am

Depending on what you are currently doing, you might be able to tune it and make it more efficient. Scaling up or out is otherwise a good way to increase performance.

paulachinthpaul · July 27, 2017, 8:40am

Thanks for the reply. You mean to say, if I install elasticsearch and logstash in 3 machines and do log parsing parralley , it would speed up??

Christian_Dahlqvist · July 27, 2017, 8:43am

It depends on what is limiting performance. I would suspect Elasticsearch to be the bottleneck as you only have one node, so giving this the full host by removing the Logstash instance that is colocated might help.

How many indices and shards are you actively indexing into? The reason I am asking is that this can affect performance as well.

paulachinthpaul · July 27, 2017, 8:48am

To be honest, I have not specified the number of indices and shards that are actively indexing into. Could you please guide me on exact location where I need to mention the shards and indices. I installed and tried how much time it would take to process 300 GB file. Since 6 days is too much , I am planning to tune so as to get the data in 2 days. So in future I need not wait for 6 days

Christian_Dahlqvist · July 27, 2017, 8:57am

If you are using the defaults you are probably indexing into a single index, as that is the Logstash default. What is the specification of the server where Elasticsearch is running? What kind of storage do you have? What indexing rate are you seeing? Do you have X-Pack Monitoring installed?

paulachinthpaul · July 27, 2017, 9:02am

Okay. So I am using single index [Logstash default].

Specification of server:
RAM: 30 GB
Storage Space: HDD 300GB

I am not sure about the indexing rate. I tried installing X-pack. But the parsing stopped after which I removed the X-pack.

Christian_Dahlqvist · July 27, 2017, 9:03am

How many cores do you have available?

paulachinthpaul · July 27, 2017, 9:06am

4 cores per machine. [Altogether I have 3 such machines]

Christian_Dahlqvist · July 27, 2017, 9:19am

Start by leave one of the machines for Elasticsearch and see if that makes any difference. I would also recommend installing X-Pack on Elasticsearch, Logstash and Kibana, as monitoring gives good insight into what happens both in Elasticsearch and Logstash.

paulachinthpaul · July 27, 2017, 9:35am

Just to confirm..

One server machine should have only elasticsearch +xpack in it
Remaining two machine should have logstash+xpack in it directing its output to elasticsearch of main server

Please correct me if I am wrong anywhere

Christian_Dahlqvist · July 27, 2017, 9:38am

Yes, that would be a good start. When you install X-Pack you get an evaluation license which by default enables security, which is why indexing from Logstash stops unless you configure username and password for the Elasticsearch output.

paulachinthpaul · July 27, 2017, 9:41am

Thanks a lot for the help. Will try and let you know.
one more doubt, Then I need not go for Master Slave architecture

Christian_Dahlqvist · July 27, 2017, 9:43am

If Elasticsearch is still the bottleneck once you have given it a host of its own, you may need to switch and have 2 Elasticsearch nodes running in a cluster and just one that runs Logstash.

paulachinthpaul · July 27, 2017, 9:59am

Okay.

Just summarising.

My current setup
Elasticsearch, logstash and Kibana in one main server and logstash in remaning two servers,
I beleive it was taking time while parsing logs in logstash .

My intial plan was to install Elasticsearch, logstash and Kibana in one main server and install Elasticsearch and logstash in remaning two servers. Output of these two elasticsearch will be directed to main Kibana.

As per your input,
One server machine should have only elasticsearch +xpack in it
Remaining two machine should have logstash+xpack in it directing its output to elasticsearch of main server.
If Elasticsearch is still the bottleneck once you have given it a host of its own, you may need to switch and have 2 Elasticsearch nodes running in a cluster and just one that runs Logstash.

system · August 24, 2017, 9:59am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch slow performance Elasticsearch	8	2875	July 5, 2017
Bottleneck while inputting data into the elasticsearch Logstash	7	3295	December 29, 2016
Kibana response time is slow Elasticsearch	3	2665	July 5, 2017
How do i know where is the bottleneck in ELK Elastic Stack	4	507	November 4, 2022
Small Scale elasticsearch implementation Elasticsearch	2	445	July 6, 2017

ELK - Time taking to parse

Related topics