Data loading into ES index is very slow

HI,

I am trying to load data from Mysql to elasticsearch using logstash, but its loading very slow.
6500/minute into elasticsearch.
The following is my Logstash config

input {
jdbc {
type => "type"
jdbc_driver_library => "/path/mysql-connector-java-5.1.28.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://IP:Port/DB"
jdbc_user => "user"
jdbc_password => "password"
statement => "SELECT * from tablename WHERE id > :sql_last_value"
schedule => "* * * * *"
use_column_value => true
tracking_column => id
clean_run => true
last_run_metadata_path => "/var/lib/logstash/.logstash_jdbc_last_run"
}
}

filter {
if [type] == "type" {
mutate
{
gsub => ["message", "\n", ""]
}
mutate
{
gsub => ["message", "\r", ""]
}
mutate
{
gsub => ["message", "\t", ""]
}
mutate
{
gsub => ["message", "=", ""]
}
grok {
break_on_match => false
match => [ "Id", "%{BASE10NUM:id}"]
match => [ "logdata", "%{GREEDYDATA:Record}"]
}
if ("type*" or "lb*" in [Record]) {
grok {
match => [
"Record", "%{WORD:VM1} %{WORD:VM} %{TIMESTAMP_ISO8601:Timestamp} [%{DATA:Thread_Id}] %{WORD:Log_Level} %{DATA:Error_Source} - %{GREEDYDATA:Error_Description}"
]
}
}
if "_grokparsefailure" in [tags] {
grok {
match => [
"Record", "%{TIMESTAMP_ISO8601:Timestamp} [%{DATA:Thread_Id}] %{WORD:Log_Level} %{DATA:Error_Source} - %{GREEDYDATA:Error_Description}"
]
}
}
mutate {
rename => { "logdata" => "Record" }
}

mutate { add_field => { "Error_Source_Analyzed" => "%{Error_Source}"
"Error_Description_Analyzed" => "%{Error_Description}"
}
remove_field => ["VM1","tags"]
}
date {
match => [ "Timestamp" , "yyyy-MM-dd HH:mm:ss,SSS" ]
}
}
}

output {
if [type] == "type" {
stdout { codec => rubydebug }

elasticsearch {
    hosts => "IP:port"
    index => "Index_Name"
    document_id => "%{id}"       
              }
 }

}
My system hardware config is
16 gb RAM, 250 GB Hrd disk with 4 cpu/8-core

I tried with changing LS_HEAP/ES_HEAP

Can any body help with this.

Thanks in advance
Uday.K

How are you measuring things?

Hi Warklom,

Thanks for your reply,

I gave limit 10K to 50K in sql select statement and checking the time to load the data into ES manually.
Even after removing filter from logstash configuration, its loading 15to16K per minute.

If I have data inflow like 250K per minute, what kind of setup/configuration is required.

And one more info is its single node VM, all ELK and Mysql are installed in same. Is this because of many applications are running and hardware configuration is not good enough ?

If I have data inflow like 250K per minute, what kind of setup/configuration is required.

That depends on a lot of factors, including the kinds of documents you want to index.

And one more info is its single node VM, all ELK and Mysql are installed in same. Is this because of many applications are running and hardware configuration is not good enough ?

That's definitely a possibility. Are you saturating the CPUs? Are you seeing a lot of I/O wait? What's the current bottleneck? Measure!

Hi Magnus,

Thanks for your reply, yes cpu usage is more than 100%(like 130-160), when I run logstash config.
I/O wait is fluctuating like 0.0, 0.1, 2.9, 1.6. most of the time it is 0.0 and 0.1.

My current bottleneck is 15k/minute with out any filter in logstash config. How to improve logstash performance.

input {
jdbc {
type => "log"
jdbc_driver_library => "/path/mysql-connector-java-5.1.28.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://IP:3306/DB"
jdbc_user => "root"
jdbc_password => "root"
statement => "SELECT * from table limit 15000"
schedule => "* * * * *"
use_column_value => true
tracking_column => id
clean_run => true
last_run_metadata_path => "/var/lib/logstash/.logstash_jdbc_last_run"
}
}

output {
elasticsearch {
hosts => "IP:9200"
workers => 60
document_id => "%{id}"
index => "per_rca_qns"
flush_size => 10000
}
}

The following are cpu and memory stats.
CPU
Before logstash starts

After logsatsh starts

Memory
Before logstash starts

After logsatsh starts

Hi,

any update regarding this.

Thanks,
Uday.K

Hi Magnus,

Based on above screenshots, can you tell me what can I do to improve logstash performance.

Thanks,
Uday.K

I'm not sure where the bottleneck is. The CPUs are not quite saturated but still have a sizable load so increasing concurrency won't help much.

Thanks for reply,

Can you suggest me some logstash or elasticsearch performance improvement tips. I will try to implement and figure it out the issues.

Thanks,
Uday.K

As I don't know where the bottleneck is I have no concrete improvement suggestions.

if I need to check any other system parameters and any other things from elasticsearch and logstash end. please let me know.

Having all components on a single VM can make it difficult to troubleshoot performance issues as they all affect each other. If you can deploy the different components on separate VMs it may be easier to get assistance in pinpointing the bottleneck and optimising the pipeline.

Thanks for reply Christian.

I will check that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.