Data loading into ES index is very slow

udaykirankona · December 15, 2016, 12:34pm

HI,

I am trying to load data from Mysql to elasticsearch using logstash, but its loading very slow.
6500/minute into elasticsearch.
The following is my Logstash config

input {
jdbc {
type => "type"
jdbc_driver_library => "/path/mysql-connector-java-5.1.28.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://IP:Port/DB"
jdbc_user => "user"
jdbc_password => "password"
statement => "SELECT * from tablename WHERE id > :sql_last_value"
schedule => "* * * * *"
use_column_value => true
tracking_column => id
clean_run => true
last_run_metadata_path => "/var/lib/logstash/.logstash_jdbc_last_run"
}
}

filter {
if [type] == "type" {
mutate
{
gsub => ["message", "\n", ""]
}
mutate
{
gsub => ["message", "\r", ""]
}
mutate
{
gsub => ["message", "\t", ""]
}
mutate
{
gsub => ["message", "=", ""]
}
grok {
break_on_match => false
match => [ "Id", "%{BASE10NUM:id}"]
match => [ "logdata", "%{GREEDYDATA:Record}"]
}
if ("type*" or "lb*" in [Record]) {
grok {
match => [
"Record", "%{WORD:VM1} %{WORD:VM} %{TIMESTAMP_ISO8601:Timestamp} [%{DATA:Thread_Id}] %{WORD:Log_Level} %{DATA:Error_Source} - %{GREEDYDATA:Error_Description}"
]
}
}
if "_grokparsefailure" in [tags] {
grok {
match => [
"Record", "%{TIMESTAMP_ISO8601:Timestamp} [%{DATA:Thread_Id}] %{WORD:Log_Level} %{DATA:Error_Source} - %{GREEDYDATA:Error_Description}"
]
}
}
mutate {
rename => { "logdata" => "Record" }
}

mutate { add_field => { "Error_Source_Analyzed" => "%{Error_Source}"
"Error_Description_Analyzed" => "%{Error_Description}"
}
remove_field => ["VM1","tags"]
}
date {
match => [ "Timestamp" , "yyyy-MM-dd HH:mm:ss,SSS" ]
}
}
}

output {
if [type] == "type" {
stdout { codec => rubydebug }

elasticsearch {
    hosts => "IP:port"
    index => "Index_Name"
    document_id => "%{id}"       
              }
 }

}
My system hardware config is
16 gb RAM, 250 GB Hrd disk with 4 cpu/8-core

I tried with changing LS_HEAP/ES_HEAP

Can any body help with this.

Thanks in advance
Uday.K

warkolm · December 16, 2016, 1:14am

How are you measuring things?

udaykirankona · December 16, 2016, 10:01am

Hi Warklom,

Thanks for your reply,

I gave limit 10K to 50K in sql select statement and checking the time to load the data into ES manually.
Even after removing filter from logstash configuration, its loading 15to16K per minute.

If I have data inflow like 250K per minute, what kind of setup/configuration is required.

And one more info is its single node VM, all ELK and Mysql are installed in same. Is this because of many applications are running and hardware configuration is not good enough ?

magnusbaeck · December 20, 2016, 9:11pm

If I have data inflow like 250K per minute, what kind of setup/configuration is required.

That depends on a lot of factors, including the kinds of documents you want to index.

And one more info is its single node VM, all ELK and Mysql are installed in same. Is this because of many applications are running and hardware configuration is not good enough ?

That's definitely a possibility. Are you saturating the CPUs? Are you seeing a lot of I/O wait? What's the current bottleneck? Measure!

udaykirankona · December 22, 2016, 9:21am

Hi Magnus,

Thanks for your reply, yes cpu usage is more than 100%(like 130-160), when I run logstash config.
I/O wait is fluctuating like 0.0, 0.1, 2.9, 1.6. most of the time it is 0.0 and 0.1.

My current bottleneck is 15k/minute with out any filter in logstash config. How to improve logstash performance.

input {
jdbc {
type => "log"
jdbc_driver_library => "/path/mysql-connector-java-5.1.28.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://IP:3306/DB"
jdbc_user => "root"
jdbc_password => "root"
statement => "SELECT * from table limit 15000"
schedule => "* * * * *"
use_column_value => true
tracking_column => id
clean_run => true
last_run_metadata_path => "/var/lib/logstash/.logstash_jdbc_last_run"
}
}

output {
elasticsearch {
hosts => "IP:9200"
workers => 60
document_id => "%{id}"
index => "per_rca_qns"
flush_size => 10000
}
}

The following are cpu and memory stats.
CPU
Before logstash starts

After logsatsh starts

Memory
Before logstash starts

After logsatsh starts

udaykirankona · January 9, 2017, 5:23am

Hi,

any update regarding this.

Thanks,
Uday.K

udaykirankona · January 18, 2017, 2:54pm

Hi Magnus,

Based on above screenshots, can you tell me what can I do to improve logstash performance.

Thanks,
Uday.K

magnusbaeck · January 19, 2017, 6:42am

I'm not sure where the bottleneck is. The CPUs are not quite saturated but still have a sizable load so increasing concurrency won't help much.

udaykirankona · January 19, 2017, 7:33am

Thanks for reply,

Can you suggest me some logstash or elasticsearch performance improvement tips. I will try to implement and figure it out the issues.

Thanks,
Uday.K

magnusbaeck · January 19, 2017, 7:35am

As I don't know where the bottleneck is I have no concrete improvement suggestions.

udaykirankona · January 19, 2017, 7:45am

if I need to check any other system parameters and any other things from elasticsearch and logstash end. please let me know.

Christian_Dahlqvist · January 19, 2017, 7:54am

Having all components on a single VM can make it difficult to troubleshoot performance issues as they all affect each other. If you can deploy the different components on separate VMs it may be easier to get assistance in pinpointing the bottleneck and optimising the pipeline.

udaykirankona · January 19, 2017, 9:17am

Thanks for reply Christian.

I will check that.

system · February 16, 2017, 9:18am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow Data loading to elasticsearch Logstash	15	5307	July 13, 2017
Terribly long time to load data from MySQL Logstash	10	996	October 16, 2018
Improve performance of Logstash data loading into ES Logstash	16	4320	July 18, 2017
Logstash-Filter-Elasticsearch Slow Logstash	6	1484	January 19, 2018
Logstash output Performance Logstash	1	495	February 8, 2017

Data loading into ES index is very slow

Related topics