Help needed for tracing index slow down

I am using elasticsearch 0.19.11 at AWS EC2 m2.4xlarge 6 instances,
allocated 35GB to ES with mlock.all=true

Now, I implemented kafka river to pull the data from kafka cluster. There
can be index delay between current timestamp and the timestamp of the
pulled data depending on elasticsearch index performance. This measure is
very critical because we are using elasticsearch as realtime log analytics.
I can see index delay is increasing significantly at some point but I don't
know how to trace the problem.

https://lh5.googleusercontent.com/-fAJKLSDRL9k/UKKXCcrZ2RI/AAAAAAAAADo/Gx1Iow31ECw/s1600/indexdelay.tiff

I figure this is caused by merging but I am suspicious about disk IO
behavior. I set up elasticsearch data directory at disk md0 but index delay
looks synched with sda1 mounted at root(/) not md0.

https://lh5.googleusercontent.com/-apgN-Uvexi4/UKKXZ6WXY1I/AAAAAAAAADw/5iFJjOaecpo/s1600/disk_md0.tiff

https://lh6.googleusercontent.com/-NSGir6ksLLQ/UKKXcdTSc1I/AAAAAAAAAD4/431bqo-uHhM/s1600/disk_sda1.tiff

Can you guess what is writing to sda1? Also, how can I trace this slow
down? I want to see any indexing actions such as when merging, flushing,
refresh starts but I cannot see any INFO messages with the following
logging.yml setting.

ootLogger: INFO, file
logger:

log action execution errors for easier debugging

gateway

index.gateway: INFO

peer shard recovery

indices.recovery: INFO

discovery

discovery: INFO

index.search.slowlog: INFO, index_search_slow_log_file

additivity:
index.search.slowlog: false

Thank you!
Best, Jae

--

Hello Jae,

I would start with some monitoring tool for ES, to see if it brings up any
clues. There are quite a lot out there, we also provide a good one that's
still free[0]

If you want specific stats about merging, refreshing and flushing, check
out the Indices Stats API[1]

And if you want to know what is writing to /dev/sda1 (I can't think of
anything else but the log file), a good starting point is the fuser
command. Something like:

fuser -m /dev/sda1

[0] http://sematext.com/spm/elasticsearch-performance-monitoring/index.html
[1]
http://www.elasticsearch.org/guide/reference/api/admin-indices-stats.html

Best regards,
Radu

http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

On Tue, Nov 13, 2012 at 8:56 PM, Jae metacret@gmail.com wrote:

I am using elasticsearch 0.19.11 at AWS EC2 m2.4xlarge 6 instances,
allocated 35GB to ES with mlock.all=true

Now, I implemented kafka river to pull the data from kafka cluster. There
can be index delay between current timestamp and the timestamp of the
pulled data depending on elasticsearch index performance. This measure is
very critical because we are using elasticsearch as realtime log analytics.
I can see index delay is increasing significantly at some point but I don't
know how to trace the problem.

https://lh5.googleusercontent.com/-fAJKLSDRL9k/UKKXCcrZ2RI/AAAAAAAAADo/Gx1Iow31ECw/s1600/indexdelay.tiff

I figure this is caused by merging but I am suspicious about disk IO
behavior. I set up elasticsearch data directory at disk md0 but index delay
looks synched with sda1 mounted at root(/) not md0.

https://lh5.googleusercontent.com/-apgN-Uvexi4/UKKXZ6WXY1I/AAAAAAAAADw/5iFJjOaecpo/s1600/disk_md0.tiff

https://lh6.googleusercontent.com/-NSGir6ksLLQ/UKKXcdTSc1I/AAAAAAAAAD4/431bqo-uHhM/s1600/disk_sda1.tiff

Can you guess what is writing to sda1? Also, how can I trace this slow
down? I want to see any indexing actions such as when merging, flushing,
refresh starts but I cannot see any INFO messages with the following
logging.yml setting.

ootLogger: INFO, file
logger:

log action execution errors for easier debugging

gateway

index.gateway: INFO

peer shard recovery

indices.recovery: INFO

discovery

discovery: INFO

index.search.slowlog: INFO, index_search_slow_log_file

additivity:
index.search.slowlog: false

Thank you!
Best, Jae

--

--