Elasticsearch performance tuning

Deva_Raj · February 18, 2015, 9:48am

Hi All,

In a Single Node Elastic Search along with logstash, We tested with 20mb
and 200mb file parsing to Elastic Search on Different types of the AWS
instance i.e Medium, Large and Xlarge.

Environment Details : Medium instance 3.75 RAM 1 cores Storage :4 GB SSD
64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search

Scenario: 1

**With default settings** 
Result :
20mb logfile 23 mins Events Per/second 175
200mb logfile 3 hrs 3 mins Events Per/second 175


Added the following to settings:
Java heap size : 2GB
bootstrap.mlockall: true
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
index.translog.flush_threshold_ops: 50000
indices.memory.index_buffer_size: 50%

# Search thread pool
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

**With added settings** 
Result:
20mb logfile 22 mins Events Per/second 180
200mb logfile 3 hrs 07 mins Events Per/second 180

Scenario 2

Environment Details : R3 Large 15.25 RAM 2 cores Storage :32 GB SSD
64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search

**With default settings** 
Result :
  20mb logfile 7 mins Events Per/second 750
  200mb logfile 65 mins Events Per/second 800

Added the following to settings:
Java heap size: 7gb
other parameters same as above

**With added settings** 
Result:
20mb logfile 7 mins Events Per/second 800
200mb logfile 55 mins Events Per/second 800

Scenario 3

Environment Details :
R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Storage :32 GB SSD
64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search

**With default settings** 
  Result:
  20mb logfile 7 mins Events Per/second 1200
  200mb logfile 34 mins Events Per/second 1200

 Added the following to settings:
    Java heap size: 15gb
    other parameters same as above

**With added settings** 
Result:
    20mb logfile 7 mins Events Per/second 1200
    200mb logfile 34 mins Events Per/second 1200

I wanted to know

What is the benchmark for the performance?
Is the performance meets the benchmark or is it below the benchmark
Why even after i increased the elasticsearch JVM iam not able to find
the difference?
how do i monitor Logstash and improve its performance?

appreciate any help on this as iam new to logstash and elastic search.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5b136c9-de21-4f0c-ba78-d8146376f307%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · February 19, 2015, 1:54am

It depends
It depends
It depends
It also depends.

The performance of ES is dependent on you; your data, your use, your
queries, your hardware, your configuration. If that is the results you got
then it is indicative to your setup and thus is your benchmark, and from
there you can tweak and try to improve performance.

Monitoring LS is a little harder as there are no APIs for it (yet). Most of
the performance of it will result on your filters (especially grok).

On 18 February 2015 at 20:48, Deva Raj devarajcsessn@gmail.com wrote:

Hi All,

In a Single Node Elastic Search along with logstash, We tested with 20mb
and 200mb file parsing to Elastic Search on Different types of the AWS
instance i.e Medium, Large and Xlarge.

Environment Details : Medium instance 3.75 RAM 1 cores Storage :4 GB SSD
64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search

Scenario: 1
**With default settings**
Result :
20mb logfile 23 mins Events Per/second 175
200mb logfile 3 hrs 3 mins Events Per/second 175


Added the following to settings:
Java heap size : 2GB
bootstrap.mlockall: true
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
index.translog.flush_threshold_ops: 50000
indices.memory.index_buffer_size: 50%

# Search thread pool
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

**With added settings**
Result:
20mb logfile 22 mins Events Per/second 180
200mb logfile 3 hrs 07 mins Events Per/second 180
Scenario 2

Environment Details : R3 Large 15.25 RAM 2 cores Storage :32 GB SSD
64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search
**With default settings**
Result :
  20mb logfile 7 mins Events Per/second 750
  200mb logfile 65 mins Events Per/second 800

Added the following to settings:
Java heap size: 7gb
other parameters same as above

**With added settings**
Result:
20mb logfile 7 mins Events Per/second 800
200mb logfile 55 mins Events Per/second 800
Scenario 3

Environment Details :
R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Storage :32 GB SSD
64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search
**With default settings**
  Result:
  20mb logfile 7 mins Events Per/second 1200
  200mb logfile 34 mins Events Per/second 1200

 Added the following to settings:
    Java heap size: 15gb
    other parameters same as above

**With added settings**
Result:
    20mb logfile 7 mins Events Per/second 1200
    200mb logfile 34 mins Events Per/second 1200
I wanted to know

What is the benchmark for the performance?

Is the performance meets the benchmark or is it below the benchmark

Why even after i increased the elasticsearch JVM iam not able to find
the difference?

how do i monitor Logstash and improve its performance?

appreciate any help on this as iam new to logstash and Elasticsearch.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f5b136c9-de21-4f0c-ba78-d8146376f307%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f5b136c9-de21-4f0c-ba78-d8146376f307%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_XRyHNnAt81NbdP0rK0r82%3D9LCNJsQTayEQiQNE8AA5g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Deva_Raj · February 19, 2015, 6:11am

Hi Mark Walkom,

Thanks mark and i miss anything to tuning performance of elasticsearch.

Added the following to elasticsearch settings:
Java heap size : Half of physical memory
bootstrap.mlockall: true
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
index.translog.flush_
threshold_ops: 50000
indices.memory.index_buffer_size: 50%

On Thursday, February 19, 2015 at 7:25:27 AM UTC+5:30, Mark Walkom wrote:

It depends

It depends

It depends

It also depends.

The performance of ES is dependent on you; your data, your use, your
queries, your hardware, your configuration. If that is the results you got
then it is indicative to your setup and thus is your benchmark, and from
there you can tweak and try to improve performance.

Monitoring LS is a little harder as there are no APIs for it (yet). Most
of the performance of it will result on your filters (especially grok).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54a9031b-1e73-42b7-92b9-7ae3bda46ee7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Deva_Raj · February 19, 2015, 10:12am

Hi Mark Walkom,

I have given below logstash conf file

Logstash conf

input {
file {

}

filter {
mutate
{
gsub => ["message", "\n", " "]
}
mutate
{
gsub => ["message", "\t", " "]
}
multiline
{
pattern => "^ "
what => "previous"
}

grok { match => [ "message", "%{TIME:log_time}|%{WORD:Message_type}|%{GREEDYDATA:Component}|%{NUMBER:line_number}| %{GREEDYDATA:log_message}"]
match => [ "path" , "%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}:%{DATE:logdate}.log"]

     break_on_match => false

}

#To check location is S or L
if [loccode] == "S" or [loccode] == "L" {
ruby {
code => " temp = event['machine'].split('')
if !temp.nil? || !temp.empty?
event['_machine'] = temp[0]
end"
}
}
mutate {

add_field => ["event_timestamp", "%{@timestamp}" ]
replace => [ "log_time", "%{logdate} %{log_time}" ]
# Remove the 'logdate' field since we don't need it anymore.

lowercase=>["loccode"]
remove => "logdate"

}

to get all site details (site name, city and co-ordinates)

sitelocator{sitename => "loccode" datafile=>"vendor/sitelocator/SiteDetails.csv"}
date { locale=>"en"
match => [ "log_time", "yyyy-MM-dd HH:mm:ss", "MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] }

}

output {
elasticsearch{
}

}

I have checked step by step to find bottleneck filter. Below filter which
took much time. Can you guide me How can I tune it to get faster.

date { locale=>"en" match => [ "log_time", "yyyy-MM-dd HH:mm:ss",
"MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] } }
http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558

Thanks
Devaraj

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · February 19, 2015, 10:44pm

Don't change cache and buffer sizes unless you know what is happening, the
defaults are going to be fine.
How much heap did you give ES?

I'm not sure you can do much about the date filter though, maybe someone
else has pointers.

On 19 February 2015 at 21:12, Deva Raj devarajcsessn@gmail.com wrote:

Hi Mark Walkom,

I have given below logstash conf file

Logstash conf

input {
file {

}

}

filter {
mutate
{
gsub => ["message", "\n", " "]
}
mutate
{
gsub => ["message", "\t", " "]
}
multiline
{
pattern => "^ "
what => "previous"
}

grok { match => [ "message", "%{TIME:log_time}|%{WORD:Message_type}|%{GREEDYDATA:Component}|%{NUMBER:line_number}| %{GREEDYDATA:log_message}"]
match => [ "path" , "%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}:%{DATE:logdate}.log"]
     break_on_match => false
}

#To check location is S or L
if [loccode] == "S" or [loccode] == "L" {
ruby {
code => " temp = event['machine'].split('')
if !temp.nil? || !temp.empty?
event['_machine'] = temp[0]
end"
}
}
mutate {
add_field => ["event_timestamp", "%{@timestamp}" ]
replace => [ "log_time", "%{logdate} %{log_time}" ]
# Remove the 'logdate' field since we don't need it anymore.
lowercase=>["loccode"]
remove => "logdate"

}

to get all site details (site name, city and co-ordinates)

sitelocator{sitename => "loccode" datafile=>"vendor/sitelocator/SiteDetails.csv"}
date { locale=>"en"
match => [ "log_time", "yyyy-MM-dd HH:mm:ss", "MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] }

}

output {
elasticsearch{
}

}

I have checked step by step to find bottleneck filter. Below filter which
took much time. Can you guide me How can I tune it to get faster.

date { locale=>"en" match => [ "log_time", "yyyy-MM-dd HH:mm:ss",
"MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] } }
http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558

Thanks
Devaraj

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-bB8%3DY0fd4HKcJ9Tw6OENwOTkMYo2muZs-Pd7-dt%2BA9w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Deva_Raj · February 20, 2015, 5:30am

I listed below instance and his heap size details.

Medium instance 3.75 RAM 1 cores Storage :4 GB SSD 64-bit Network

Java heap size: 2gb

R3 Large 15.25 RAM 2 cores Storage :32 GB SSD

Java heap size: 7gb

R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores

Java heap size: 15gb

Thanks
Devaraj

On Friday, February 20, 2015 at 4:15:12 AM UTC+5:30, Mark Walkom wrote:

Don't change cache and buffer sizes unless you know what is happening, the
defaults are going to be fine.
How much heap did you give ES?

I'm not sure you can do much about the date filter though, maybe someone
else has pointers.

On 19 February 2015 at 21:12, Deva Raj <devara...@gmail.com <javascript:>>
wrote:
Hi Mark Walkom,

I have given below logstash conf file

Logstash conf

input {
file {

}

}

filter {
mutate
{
gsub => ["message", "\n", " "]
}
mutate
{
gsub => ["message", "\t", " "]
}
multiline
{
pattern => "^ "
what => "previous"
}

grok { match => [ "message", "%{TIME:log_time}|%{WORD:Message_type}|%{GREEDYDATA:Component}|%{NUMBER:line_number}| %{GREEDYDATA:log_message}"]
match => [ "path" , "%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}:%{DATE:logdate}.log"]
     break_on_match => false
}

#To check location is S or L
if [loccode] == "S" or [loccode] == "L" {
ruby {
code => " temp = event['machine'].split('')
if !temp.nil? || !temp.empty?
event['_machine'] = temp[0]
end"
}
}
mutate {
add_field => ["event_timestamp", "%{@timestamp}" ]
replace => [ "log_time", "%{logdate} %{log_time}" ]
# Remove the 'logdate' field since we don't need it anymore.
lowercase=>["loccode"]
remove => "logdate"

}

to get all site details (site name, city and co-ordinates)

sitelocator{sitename => "loccode" datafile=>"vendor/sitelocator/SiteDetails.csv"}
date { locale=>"en"
match => [ "log_time", "yyyy-MM-dd HH:mm:ss", "MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] }

}

output {
elasticsearch{
}

}

I have checked step by step to find bottleneck filter. Below filter which
took much time. Can you guide me How can I tune it to get faster.

date { locale=>"en" match => [ "log_time", "yyyy-MM-dd HH:mm:ss",
"MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] } }
http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558

Thanks
Devaraj

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5335d517-d7d6-482f-a4b4-6ab06eb13e02%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · February 22, 2015, 11:50pm

Are you running a single cluster with all of those nodes included?
Have you changed the roles that these play, ie master, data, client, or are
they the default?

On 20 February 2015 at 16:30, Deva Raj devarajcsessn@gmail.com wrote:

I listed below instance and his heap size details.

Medium instance 3.75 RAM 1 cores Storage :4 GB SSD 64-bit Network

Java heap size: 2gb

R3 Large 15.25 RAM 2 cores Storage :32 GB SSD

Java heap size: 7gb

R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores

Java heap size: 15gb

Thanks
Devaraj

On Friday, February 20, 2015 at 4:15:12 AM UTC+5:30, Mark Walkom wrote:
Don't change cache and buffer sizes unless you know what is happening,
the defaults are going to be fine.
How much heap did you give ES?

I'm not sure you can do much about the date filter though, maybe someone
else has pointers.

On 19 February 2015 at 21:12, Deva Raj devara...@gmail.com wrote:
Hi Mark Walkom,

I have given below logstash conf file

Logstash conf

input {
file {

}

}

filter {
mutate
{
gsub => ["message", "\n", " "]
}
mutate
{
gsub => ["message", "\t", " "]
}
multiline
{
pattern => "^ "
what => "previous"
}

grok { match => [ "message", "%{TIME:log_time}|%{WORD:Message_type}|%{GREEDYDATA:Component}|%{NUMBER:line_number}| %{GREEDYDATA:log_message}"]
match => [ "path" , "%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}:%{DATE:logdate}.log"]
     break_on_match => false
}

#To check location is S or L
if [loccode] == "S" or [loccode] == "L" {
ruby {
code => " temp = event['machine'].split('')
if !temp.nil? || !temp.empty?
event['_machine'] = temp[0]
end"
}
}
mutate {
add_field => ["event_timestamp", "%{@timestamp}" ]
replace => [ "log_time", "%{logdate} %{log_time}" ]
# Remove the 'logdate' field since we don't need it anymore.
lowercase=>["loccode"]
remove => "logdate"

}

to get all site details (site name, city and co-ordinates)

sitelocator{sitename => "loccode" datafile=>"vendor/sitelocator/SiteDetails.csv"}
date { locale=>"en"
match => [ "log_time", "yyyy-MM-dd HH:mm:ss", "MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] }

}

output {
elasticsearch{
}

}

I have checked step by step to find bottleneck filter. Below filter
which took much time. Can you guide me How can I tune it to get faster.

date { locale=>"en" match => [ "log_time", "yyyy-MM-dd HH:mm:ss",
"MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] } }
http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558

Thanks
Devaraj

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5335d517-d7d6-482f-a4b4-6ab06eb13e02%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5335d517-d7d6-482f-a4b4-6ab06eb13e02%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-M05jDiiGk1m8TMBYOD3qcWLtVNyatKzxEZzAmf9Kthw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Elasticsearch JVM options Elasticsearch	9	875	July 6, 2017
Elastic search and logstash performance tuning Logstash	2	409	March 25, 2019
Elasticsearch with logstash tuning Elasticsearch	10	419	March 26, 2019
How to monitor logstash performance? Logstash	2	297	June 30, 2019
Recommendations for performance tuning of a server with 256G of RAM Elasticsearch	3	1160	May 3, 2019

Elasticsearch performance tuning

to get all site details (site name, city and co-ordinates)

to get all site details (site name, city and co-ordinates)

to get all site details (site name, city and co-ordinates)

to get all site details (site name, city and co-ordinates)

Related topics