Elasticsearch becomes unresponsive after updating to 0.90.x

I am trying to upgrade elasticsearch server to the latest version.

We are using thrift in order our application to connect to the
elasticsearch. After a while, elasticsearch does not respond at any request.

It seems that elasticsearch comes to a deadlock or something but I haven't
found a way to reproduce this. The only thing I have is
an unresponsive instance. Can you recommend a java tool like jdb that would
help me to report this problem properly?

I have to mention that there was no problem with 0.19.x versions.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

UPDATE:

I found out that the http restfull api responds properly. Whereas thrift
does not respond and I can see around 60 connections attached to the thrift
port in CLOSE_WAIT state.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

https://lh5.googleusercontent.com/-tUS2JxI0xtE/Ua3sSFpXfeI/AAAAAAAAAE0/ovY3GPsoD60/s1600/heap-memory-0-90-1.png

https://lh5.googleusercontent.com/-8RhVQ_L8Cg8/Ua3rvelmDfI/AAAAAAAAAEs/g9gGcbsupRk/s1600/heap-memory-0-19-12.png
I managed to reproduce the problem. It seems that successive index creation
and deleting the contents of an index using delete by query with match_all
makes the server to utilize the whole heap memory. After a while the server
becomes unresponsive.
This happens at versions 0.90.x and 0.19.12 that I have tested. The
difference is that in the 0.19.8 version, the garbage collector is applied
and the heap is empty. The servers starts to run again without any problem.

At the version 0.90.1, although the garbage collector is applied the heap
memory hasn't been reduced at all.

I don't know if this is related with this issue:
https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/w370mdz3QDA

The script that I used to reproduce the problem is the following:

#!/bin/bash

while true; do

echo "Deleting all indexes"
curl -XDELETE 'http://localhost:9200/_all/?pretty' > /dev/null 2>&1

echo "Creating index"
curl -XPUT 'http://localhost:9200/test/?pretty' -d '{
"index" : true,
"settings": {
"number_of_shards" : 1
},
"default": {
"include_in_all" : true
},
"mappings" : {
"blog_post" : {
"type" : "object",
"_all" : {"enabled" : true },
"_source": {"enabled" : true, "compress" : true}
}
}
}' > /dev/null 2>&1

for i in {1..10}
do
echo "Truncating $i times"
curl -XDELETE 'http://localhost:9200/test/blog_post/_query?pretty' -d '{
"match_all" : {}
}' > /dev/null 2>&1

echo "Refreshing $i times"
curl -XPOST 'http://localhost:9200/test/_refresh?pretty' > /dev/null  2

&1
done

done

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

We are experiencing a similar problem. In our case, the nodes are heavy
duty with 15GB of memory (8GB mlocked by ES) and are part of a single
cluster. Previously we used to observe the heap memory dropping
significantly (maybe to 2-3GB) once it got close to 8GB (probably because
the GC kicked in). However after upgrading to 0.90, we see that the heap
memory keeps on increasing and plateaus around 7-8GB. When that happens CPU
utilization suddenly spikes and the GC also starts happening much more
frequently. We've tried restarting the cluster but to no effect.

On Tuesday, June 4, 2013 6:37:28 AM UTC-7, Tasos Stathopoulos wrote:

https://lh5.googleusercontent.com/-tUS2JxI0xtE/Ua3sSFpXfeI/AAAAAAAAAE0/ovY3GPsoD60/s1600/heap-memory-0-90-1.png

https://lh5.googleusercontent.com/-8RhVQ_L8Cg8/Ua3rvelmDfI/AAAAAAAAAEs/g9gGcbsupRk/s1600/heap-memory-0-19-12.png
I managed to reproduce the problem. It seems that successive index
creation and deleting the contents of an index using delete by query with
match_all makes the server to utilize the whole heap memory. After a while
the server becomes unresponsive.
This happens at versions 0.90.x and 0.19.12 that I have tested. The
difference is that in the 0.19.8 version, the garbage collector is applied
and the heap is empty. The servers starts to run again without any problem.

At the version 0.90.1, although the garbage collector is applied the heap
memory hasn't been reduced at all.

I don't know if this is related with this issue:
https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/w370mdz3QDA

The script that I used to reproduce the problem is the following:

#!/bin/bash

while true; do

echo "Deleting all indexes"
curl -XDELETE 'http://localhost:9200/_all/?pretty' > /dev/null 2>&1

echo "Creating index"
curl -XPUT 'http://localhost:9200/test/?pretty' -d '{
"index" : true,
"settings": {
"number_of_shards" : 1
},
"default": {
"include_in_all" : true
},
"mappings" : {
"blog_post" : {
"type" : "object",
"_all" : {"enabled" : true },
"_source": {"enabled" : true, "compress" : true}
}
}
}' > /dev/null 2>&1

for i in {1..10}
do
echo "Truncating $i times"
curl -XDELETE 'http://localhost:9200/test/blog_post/_query?pretty' -d
'{
"match_all" : {}
}' > /dev/null 2>&1

echo "Refreshing $i times"
curl -XPOST 'http://localhost:9200/test/_refresh?pretty' > /dev/null  

2>&1
done

done

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.