Elasticsearch performance tuning

Hi All,

In a Single Node Elastic Search along with logstash, We tested with 20mb
and 200mb file parsing to Elastic Search on Different types of the AWS
instance i.e Medium, Large and Xlarge.

Environment Details : Medium instance 3.75 RAM 1 cores Storage :4 GB SSD
64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search

Scenario: 1

**With default settings** 
Result :
20mb logfile 23 mins Events Per/second 175
200mb logfile 3 hrs 3 mins Events Per/second 175


Added the following to settings:
Java heap size : 2GB
bootstrap.mlockall: true
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
index.translog.flush_threshold_ops: 50000
indices.memory.index_buffer_size: 50%

# Search thread pool
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

**With added settings** 
Result:
20mb logfile 22 mins Events Per/second 180
200mb logfile 3 hrs 07 mins Events Per/second 180

Scenario 2

Environment Details : R3 Large 15.25 RAM 2 cores Storage :32 GB SSD
64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search

**With default settings** 
Result :
  20mb logfile 7 mins Events Per/second 750
  200mb logfile 65 mins Events Per/second 800

Added the following to settings:
Java heap size: 7gb
other parameters same as above

**With added settings** 
Result:
20mb logfile 7 mins Events Per/second 800
200mb logfile 55 mins Events Per/second 800

Scenario 3

Environment Details :
R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Storage :32 GB SSD
64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search

**With default settings** 
  Result:
  20mb logfile 7 mins Events Per/second 1200
  200mb logfile 34 mins Events Per/second 1200

 Added the following to settings:
    Java heap size: 15gb
    other parameters same as above

**With added settings** 
Result:
    20mb logfile 7 mins Events Per/second 1200
    200mb logfile 34 mins Events Per/second 1200

I wanted to know

  1. What is the benchmark for the performance?
  2. Is the performance meets the benchmark or is it below the benchmark
  3. Why even after i increased the elasticsearch JVM iam not able to find
    the difference?
  4. how do i monitor Logstash and improve its performance?

appreciate any help on this as iam new to logstash and elastic search.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5b136c9-de21-4f0c-ba78-d8146376f307%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

  1. It depends
  2. It depends
  3. It depends
  4. It also depends.

The performance of ES is dependent on you; your data, your use, your
queries, your hardware, your configuration. If that is the results you got
then it is indicative to your setup and thus is your benchmark, and from
there you can tweak and try to improve performance.

Monitoring LS is a little harder as there are no APIs for it (yet). Most of
the performance of it will result on your filters (especially grok).

On 18 February 2015 at 20:48, Deva Raj devarajcsessn@gmail.com wrote:

Hi All,

In a Single Node Elastic Search along with logstash, We tested with 20mb
and 200mb file parsing to Elastic Search on Different types of the AWS
instance i.e Medium, Large and Xlarge.

Environment Details : Medium instance 3.75 RAM 1 cores Storage :4 GB SSD
64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search

Scenario: 1

**With default settings**
Result :
20mb logfile 23 mins Events Per/second 175
200mb logfile 3 hrs 3 mins Events Per/second 175


Added the following to settings:
Java heap size : 2GB
bootstrap.mlockall: true
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
index.translog.flush_threshold_ops: 50000
indices.memory.index_buffer_size: 50%

# Search thread pool
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100

**With added settings**
Result:
20mb logfile 22 mins Events Per/second 180
200mb logfile 3 hrs 07 mins Events Per/second 180

Scenario 2

Environment Details : R3 Large 15.25 RAM 2 cores Storage :32 GB SSD
64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search

**With default settings**
Result :
  20mb logfile 7 mins Events Per/second 750
  200mb logfile 65 mins Events Per/second 800

Added the following to settings:
Java heap size: 7gb
other parameters same as above

**With added settings**
Result:
20mb logfile 7 mins Events Per/second 800
200mb logfile 55 mins Events Per/second 800

Scenario 3

Environment Details :
R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Storage :32 GB SSD
64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search

**With default settings**
  Result:
  20mb logfile 7 mins Events Per/second 1200
  200mb logfile 34 mins Events Per/second 1200

 Added the following to settings:
    Java heap size: 15gb
    other parameters same as above

**With added settings**
Result:
    20mb logfile 7 mins Events Per/second 1200
    200mb logfile 34 mins Events Per/second 1200

I wanted to know

  1. What is the benchmark for the performance?
  2. Is the performance meets the benchmark or is it below the benchmark
  3. Why even after i increased the elasticsearch JVM iam not able to find
    the difference?
  4. how do i monitor Logstash and improve its performance?

appreciate any help on this as iam new to logstash and Elasticsearch.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f5b136c9-de21-4f0c-ba78-d8146376f307%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f5b136c9-de21-4f0c-ba78-d8146376f307%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_XRyHNnAt81NbdP0rK0r82%3D9LCNJsQTayEQiQNE8AA5g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mark Walkom,

Thanks mark and i miss anything to tuning performance of elasticsearch.

Added the following to elasticsearch settings:
Java heap size : Half of physical memory
bootstrap.mlockall: true
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
index.translog.flush_
threshold_ops: 50000
indices.memory.index_buffer_size: 50%

On Thursday, February 19, 2015 at 7:25:27 AM UTC+5:30, Mark Walkom wrote:

  1. It depends
  2. It depends
  3. It depends
  4. It also depends.

The performance of ES is dependent on you; your data, your use, your
queries, your hardware, your configuration. If that is the results you got
then it is indicative to your setup and thus is your benchmark, and from
there you can tweak and try to improve performance.

Monitoring LS is a little harder as there are no APIs for it (yet). Most
of the performance of it will result on your filters (especially grok).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54a9031b-1e73-42b7-92b9-7ae3bda46ee7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mark Walkom,

I have given below logstash conf file

Logstash conf

input {
file {

}

}

filter {
mutate
{
gsub => ["message", "\n", " "]
}
mutate
{
gsub => ["message", "\t", " "]
}
multiline
{
pattern => "^ "
what => "previous"
}

grok { match => [ "message", "%{TIME:log_time}|%{WORD:Message_type}|%{GREEDYDATA:Component}|%{NUMBER:line_number}| %{GREEDYDATA:log_message}"]
match => [ "path" , "%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}:%{DATE:logdate}.log"]

     break_on_match => false

}

#To check location is S or L
if [loccode] == "S" or [loccode] == "L" {
ruby {
code => " temp = event['machine'].split('')
if !temp.nil? || !temp.empty?
event['_machine'] = temp[0]
end"
}
}
mutate {

add_field => ["event_timestamp", "%{@timestamp}" ]
replace => [ "log_time", "%{logdate} %{log_time}" ]
# Remove the 'logdate' field since we don't need it anymore.

lowercase=>["loccode"]
remove => "logdate"

}

to get all site details (site name, city and co-ordinates)

sitelocator{sitename => "loccode" datafile=>"vendor/sitelocator/SiteDetails.csv"}
date { locale=>"en"
match => [ "log_time", "yyyy-MM-dd HH:mm:ss", "MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] }

}

output {
elasticsearch{
}

}

I have checked step by step to find bottleneck filter. Below filter which
took much time. Can you guide me How can I tune it to get faster.

date { locale=>"en" match => [ "log_time", "yyyy-MM-dd HH:mm:ss",
"MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] } }
http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558

Thanks
Devaraj

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Don't change cache and buffer sizes unless you know what is happening, the
defaults are going to be fine.
How much heap did you give ES?

I'm not sure you can do much about the date filter though, maybe someone
else has pointers.

On 19 February 2015 at 21:12, Deva Raj devarajcsessn@gmail.com wrote:

Hi Mark Walkom,

I have given below logstash conf file

Logstash conf

input {
file {

}

}

filter {
mutate
{
gsub => ["message", "\n", " "]
}
mutate
{
gsub => ["message", "\t", " "]
}
multiline
{
pattern => "^ "
what => "previous"
}

grok { match => [ "message", "%{TIME:log_time}|%{WORD:Message_type}|%{GREEDYDATA:Component}|%{NUMBER:line_number}| %{GREEDYDATA:log_message}"]
match => [ "path" , "%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}:%{DATE:logdate}.log"]

     break_on_match => false

}

#To check location is S or L
if [loccode] == "S" or [loccode] == "L" {
ruby {
code => " temp = event['machine'].split('')
if !temp.nil? || !temp.empty?
event['_machine'] = temp[0]
end"
}
}
mutate {

add_field => ["event_timestamp", "%{@timestamp}" ]
replace => [ "log_time", "%{logdate} %{log_time}" ]
# Remove the 'logdate' field since we don't need it anymore.

lowercase=>["loccode"]
remove => "logdate"

}

to get all site details (site name, city and co-ordinates)

sitelocator{sitename => "loccode" datafile=>"vendor/sitelocator/SiteDetails.csv"}
date { locale=>"en"
match => [ "log_time", "yyyy-MM-dd HH:mm:ss", "MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] }

}

output {
elasticsearch{
}

}

I have checked step by step to find bottleneck filter. Below filter which
took much time. Can you guide me How can I tune it to get faster.

date { locale=>"en" match => [ "log_time", "yyyy-MM-dd HH:mm:ss",
"MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] } }
http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558

Thanks
Devaraj

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-bB8%3DY0fd4HKcJ9Tw6OENwOTkMYo2muZs-Pd7-dt%2BA9w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I listed below instance and his heap size details.

Medium instance 3.75 RAM 1 cores Storage :4 GB SSD 64-bit Network

Java heap size: 2gb

R3 Large 15.25 RAM 2 cores Storage :32 GB SSD

Java heap size: 7gb

R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores

Java heap size: 15gb

Thanks
Devaraj

On Friday, February 20, 2015 at 4:15:12 AM UTC+5:30, Mark Walkom wrote:

Don't change cache and buffer sizes unless you know what is happening, the
defaults are going to be fine.
How much heap did you give ES?

I'm not sure you can do much about the date filter though, maybe someone
else has pointers.

On 19 February 2015 at 21:12, Deva Raj <devara...@gmail.com <javascript:>>
wrote:

Hi Mark Walkom,

I have given below logstash conf file

Logstash conf

input {
file {

}

}

filter {
mutate
{
gsub => ["message", "\n", " "]
}
mutate
{
gsub => ["message", "\t", " "]
}
multiline
{
pattern => "^ "
what => "previous"
}

grok { match => [ "message", "%{TIME:log_time}|%{WORD:Message_type}|%{GREEDYDATA:Component}|%{NUMBER:line_number}| %{GREEDYDATA:log_message}"]
match => [ "path" , "%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}:%{DATE:logdate}.log"]

     break_on_match => false

}

#To check location is S or L
if [loccode] == "S" or [loccode] == "L" {
ruby {
code => " temp = event['machine'].split('')
if !temp.nil? || !temp.empty?
event['_machine'] = temp[0]
end"
}
}
mutate {

add_field => ["event_timestamp", "%{@timestamp}" ]
replace => [ "log_time", "%{logdate} %{log_time}" ]
# Remove the 'logdate' field since we don't need it anymore.

lowercase=>["loccode"]
remove => "logdate"

}

to get all site details (site name, city and co-ordinates)

sitelocator{sitename => "loccode" datafile=>"vendor/sitelocator/SiteDetails.csv"}
date { locale=>"en"
match => [ "log_time", "yyyy-MM-dd HH:mm:ss", "MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] }

}

output {
elasticsearch{
}

}

I have checked step by step to find bottleneck filter. Below filter which
took much time. Can you guide me How can I tune it to get faster.

date { locale=>"en" match => [ "log_time", "yyyy-MM-dd HH:mm:ss",
"MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] } }
http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558

Thanks
Devaraj

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5335d517-d7d6-482f-a4b4-6ab06eb13e02%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Are you running a single cluster with all of those nodes included?
Have you changed the roles that these play, ie master, data, client, or are
they the default?

On 20 February 2015 at 16:30, Deva Raj devarajcsessn@gmail.com wrote:

I listed below instance and his heap size details.

Medium instance 3.75 RAM 1 cores Storage :4 GB SSD 64-bit Network

Java heap size: 2gb

R3 Large 15.25 RAM 2 cores Storage :32 GB SSD

Java heap size: 7gb

R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores

Java heap size: 15gb

Thanks
Devaraj

On Friday, February 20, 2015 at 4:15:12 AM UTC+5:30, Mark Walkom wrote:

Don't change cache and buffer sizes unless you know what is happening,
the defaults are going to be fine.
How much heap did you give ES?

I'm not sure you can do much about the date filter though, maybe someone
else has pointers.

On 19 February 2015 at 21:12, Deva Raj devara...@gmail.com wrote:

Hi Mark Walkom,

I have given below logstash conf file

Logstash conf

input {
file {

}

}

filter {
mutate
{
gsub => ["message", "\n", " "]
}
mutate
{
gsub => ["message", "\t", " "]
}
multiline
{
pattern => "^ "
what => "previous"
}

grok { match => [ "message", "%{TIME:log_time}|%{WORD:Message_type}|%{GREEDYDATA:Component}|%{NUMBER:line_number}| %{GREEDYDATA:log_message}"]
match => [ "path" , "%{GREEDYDATA}/%{GREEDYDATA:loccode}/%{GREEDYDATA:_machine}:%{DATE:logdate}.log"]

     break_on_match => false

}

#To check location is S or L
if [loccode] == "S" or [loccode] == "L" {
ruby {
code => " temp = event['machine'].split('')
if !temp.nil? || !temp.empty?
event['_machine'] = temp[0]
end"
}
}
mutate {

add_field => ["event_timestamp", "%{@timestamp}" ]
replace => [ "log_time", "%{logdate} %{log_time}" ]
# Remove the 'logdate' field since we don't need it anymore.

lowercase=>["loccode"]
remove => "logdate"

}

to get all site details (site name, city and co-ordinates)

sitelocator{sitename => "loccode" datafile=>"vendor/sitelocator/SiteDetails.csv"}
date { locale=>"en"
match => [ "log_time", "yyyy-MM-dd HH:mm:ss", "MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] }

}

output {
elasticsearch{
}

}

I have checked step by step to find bottleneck filter. Below filter
which took much time. Can you guide me How can I tune it to get faster.

date { locale=>"en" match => [ "log_time", "yyyy-MM-dd HH:mm:ss",
"MM-dd-yyyy HH:mm:ss.SSS","ISO8601" ] } }
http://serverfault.com/questions/669534/elasticsearch-performance-tuning#comment818613_669558

Thanks
Devaraj

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7eedf369-b10d-442e-b30d-5e7969bf1c59%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5335d517-d7d6-482f-a4b4-6ab06eb13e02%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5335d517-d7d6-482f-a4b4-6ab06eb13e02%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-M05jDiiGk1m8TMBYOD3qcWLtVNyatKzxEZzAmf9Kthw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.