Data loading into ES index is very slow


(uday kiran) #1

HI,

I am trying to load data from Mysql to elasticsearch using logstash, but its loading very slow.
6500/minute into elasticsearch.
The following is my Logstash config

input {
jdbc {
type => "type"
jdbc_driver_library => "/path/mysql-connector-java-5.1.28.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://IP:Port/DB"
jdbc_user => "user"
jdbc_password => "password"
statement => "SELECT * from tablename WHERE id > :sql_last_value"
schedule => "* * * * *"
use_column_value => true
tracking_column => id
clean_run => true
last_run_metadata_path => "/var/lib/logstash/.logstash_jdbc_last_run"
}
}

filter {
if [type] == "type" {
mutate
{
gsub => ["message", "\n", ""]
}
mutate
{
gsub => ["message", "\r", ""]
}
mutate
{
gsub => ["message", "\t", ""]
}
mutate
{
gsub => ["message", "=", ""]
}
grok {
break_on_match => false
match => [ "Id", "%{BASE10NUM:id}"]
match => [ "logdata", "%{GREEDYDATA:Record}"]
}
if ("type*" or "lb*" in [Record]) {
grok {
match => [
"Record", "%{WORD:VM1} %{WORD:VM} %{TIMESTAMP_ISO8601:Timestamp} [%{DATA:Thread_Id}] %{WORD:Log_Level} %{DATA:Error_Source} - %{GREEDYDATA:Error_Description}"
]
}
}
if "_grokparsefailure" in [tags] {
grok {
match => [
"Record", "%{TIMESTAMP_ISO8601:Timestamp} [%{DATA:Thread_Id}] %{WORD:Log_Level} %{DATA:Error_Source} - %{GREEDYDATA:Error_Description}"
]
}
}
mutate {
rename => { "logdata" => "Record" }
}

mutate { add_field => { "Error_Source_Analyzed" => "%{Error_Source}"
"Error_Description_Analyzed" => "%{Error_Description}"
}
remove_field => ["VM1","tags"]
}
date {
match => [ "Timestamp" , "yyyy-MM-dd HH:mm:ss,SSS" ]
}
}
}

output {
if [type] == "type" {
stdout { codec => rubydebug }

elasticsearch {
    hosts => "IP:port"
    index => "Index_Name"
    document_id => "%{id}"       
              }
 }

}
My system hardware config is
16 gb RAM, 250 GB Hrd disk with 4 cpu/8-core

I tried with changing LS_HEAP/ES_HEAP

Can any body help with this.

Thanks in advance
Uday.K


(Mark Walkom) #2

How are you measuring things?


(uday kiran) #3

Hi Warklom,

Thanks for your reply,

I gave limit 10K to 50K in sql select statement and checking the time to load the data into ES manually.
Even after removing filter from logstash configuration, its loading 15to16K per minute.

If I have data inflow like 250K per minute, what kind of setup/configuration is required.

And one more info is its single node VM, all ELK and Mysql are installed in same. Is this because of many applications are running and hardware configuration is not good enough ?


(Magnus Bäck) #4

If I have data inflow like 250K per minute, what kind of setup/configuration is required.

That depends on a lot of factors, including the kinds of documents you want to index.

And one more info is its single node VM, all ELK and Mysql are installed in same. Is this because of many applications are running and hardware configuration is not good enough ?

That's definitely a possibility. Are you saturating the CPUs? Are you seeing a lot of I/O wait? What's the current bottleneck? Measure!


(uday kiran) #5

Hi Magnus,

Thanks for your reply, yes cpu usage is more than 100%(like 130-160), when I run logstash config.
I/O wait is fluctuating like 0.0, 0.1, 2.9, 1.6. most of the time it is 0.0 and 0.1.

My current bottleneck is 15k/minute with out any filter in logstash config. How to improve logstash performance.

input {
jdbc {
type => "log"
jdbc_driver_library => "/path/mysql-connector-java-5.1.28.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://IP:3306/DB"
jdbc_user => "root"
jdbc_password => "root"
statement => "SELECT * from table limit 15000"
schedule => "* * * * *"
use_column_value => true
tracking_column => id
clean_run => true
last_run_metadata_path => "/var/lib/logstash/.logstash_jdbc_last_run"
}
}

output {
elasticsearch {
hosts => "IP:9200"
workers => 60
document_id => "%{id}"
index => "per_rca_qns"
flush_size => 10000
}
}

The following are cpu and memory stats.
CPU
Before logstash starts

After logsatsh starts

Memory
Before logstash starts

After logsatsh starts


(uday kiran) #6

Hi,

any update regarding this.

Thanks,
Uday.K


(uday kiran) #7

Hi Magnus,

Based on above screenshots, can you tell me what can I do to improve logstash performance.

Thanks,
Uday.K


(Magnus Bäck) #8

I'm not sure where the bottleneck is. The CPUs are not quite saturated but still have a sizable load so increasing concurrency won't help much.


(uday kiran) #9

Thanks for reply,

Can you suggest me some logstash or elasticsearch performance improvement tips. I will try to implement and figure it out the issues.

Thanks,
Uday.K


(Magnus Bäck) #10

As I don't know where the bottleneck is I have no concrete improvement suggestions.


(uday kiran) #11

if I need to check any other system parameters and any other things from elasticsearch and logstash end. please let me know.


(Christian Dahlqvist) #12

Having all components on a single VM can make it difficult to troubleshoot performance issues as they all affect each other. If you can deploy the different components on separate VMs it may be easier to get assistance in pinpointing the bottleneck and optimising the pipeline.


(uday kiran) #13

Thanks for reply Christian.

I will check that.


(system) #14

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.