ELK - Time taking to parse


(Paul Achinth Paul) #1

I am new to ELK [ Elasticsearch Logstash and Kibana].

I have three machines having 30 GB RAM and around 400 GB space
I have millions of file to parse [size is around 300 GB]
I have split the files and placed in three systems.
I installed Elasticsearch Logstash and Kibana in one machine . Then installed Logstash in two other machines.
I redirected the output of logstash in each machine to one elasticsearch machine

It took 6 days to filter out the searchd item[I searched for 10 digit number, timestamp and Name of one API from all these logs]and dispay it in Kibana. Am i doing something wrong here. Is there any other way to speed up/tune this process.

Thanks in Advance, Paul


(Christian Dahlqvist) #2

Have you identified what the bottleneck is? What is CPU and disk I/O looking like on your Elasticsearch node? How many indices/shards are you actively indexing into?


(Paul Achinth Paul) #3

Not yet identified. CPU utilization was maximum for the corresponding java instance. Currently I have only one elasticsearch instance in one of my machine. Do I need to create 3 instance in 3 machine and start parsing ?


(Mark Walkom) #4

FYI we’ve renamed ELK to the Elastic Stack, otherwise beats feels left out :wink:


(Christian Dahlqvist) #5

Depending on what you are currently doing, you might be able to tune it and make it more efficient. Scaling up or out is otherwise a good way to increase performance.


(Paul Achinth Paul) #6

Thanks for the reply. You mean to say, if I install elasticsearch and logstash in 3 machines and do log parsing parralley , it would speed up??


(Christian Dahlqvist) #7

It depends on what is limiting performance. I would suspect Elasticsearch to be the bottleneck as you only have one node, so giving this the full host by removing the Logstash instance that is colocated might help.

How many indices and shards are you actively indexing into? The reason I am asking is that this can affect performance as well.


(Paul Achinth Paul) #8

To be honest, I have not specified the number of indices and shards that are actively indexing into. Could you please guide me on exact location where I need to mention the shards and indices. I installed and tried how much time it would take to process 300 GB file. Since 6 days is too much , I am planning to tune so as to get the data in 2 days. So in future I need not wait for 6 days


(Christian Dahlqvist) #9

If you are using the defaults you are probably indexing into a single index, as that is the Logstash default. What is the specification of the server where Elasticsearch is running? What kind of storage do you have? What indexing rate are you seeing? Do you have X-Pack Monitoring installed?


(Paul Achinth Paul) #10

Okay. So I am using single index [Logstash default].

Specification of server:
RAM: 30 GB
Storage Space: HDD 300GB

I am not sure about the indexing rate. I tried installing X-pack. But the parsing stopped after which I removed the X-pack.


(Christian Dahlqvist) #11

How many cores do you have available?


(Paul Achinth Paul) #12

4 cores per machine. [Altogether I have 3 such machines]


(Christian Dahlqvist) #13

Start by leave one of the machines for Elasticsearch and see if that makes any difference. I would also recommend installing X-Pack on Elasticsearch, Logstash and Kibana, as monitoring gives good insight into what happens both in Elasticsearch and Logstash.


(Paul Achinth Paul) #14

Just to confirm..

One server machine should have only elasticsearch +xpack in it
Remaining two machine should have logstash+xpack in it directing its output to elasticsearch of main server

Please correct me if I am wrong anywhere


(Christian Dahlqvist) #15

Yes, that would be a good start. When you install X-Pack you get an evaluation license which by default enables security, which is why indexing from Logstash stops unless you configure username and password for the Elasticsearch output.


(Paul Achinth Paul) #16

Thanks a lot for the help. Will try and let you know.
one more doubt, Then I need not go for Master Slave architecture


(Christian Dahlqvist) #17

If Elasticsearch is still the bottleneck once you have given it a host of its own, you may need to switch and have 2 Elasticsearch nodes running in a cluster and just one that runs Logstash.


(Paul Achinth Paul) #18

Okay.

Just summarising.

My current setup
Elasticsearch, logstash and Kibana in one main server and logstash in remaning two servers,
I beleive it was taking time while parsing logs in logstash .

My intial plan was to install Elasticsearch, logstash and Kibana in one main server and install Elasticsearch and logstash in remaning two servers. Output of these two elasticsearch will be directed to main Kibana.

As per your input,
One server machine should have only elasticsearch +xpack in it
Remaining two machine should have logstash+xpack in it directing its output to elasticsearch of main server.
If Elasticsearch is still the bottleneck once you have given it a host of its own, you may need to switch and have 2 Elasticsearch nodes running in a cluster and just one that runs Logstash.


(system) #19

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.