Hello,
I saw already some topics about the performance of ES here, but none of
them answered any of my questions. I would like to use ES for a search
engine. It looks really great how easy it is to create new nodes, connect
them together into clusters, make replicas etc. comparing to Solr or
Sphinx. I've created 3 nodes with 3 shards and 1 replica (+1 ElasticSearch
for balancing only). Each node is run with such JVM params:
-Xms5g -Xmx5g -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-XX:+UseTLAB -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled
Machine has 16GB or RAM, 4 cores Intel i5 processor @ 3.40 GHz
There is only one index with one mapping which look like that:
{
"files" : {
"groups" : {
"_source" : {
"enabled" : false
},
"properties" : {
"id" : {
"type" : "string",
"index" : "no",
"store" : "yes"
},
"clicks" : {
"type" : "integer"
},
"date_added" : {
"type" : "long"
},
"desc" : {
"type" : "string",
"term_vector" : "with_positions_offsets"
},
"extension_attr" : {
"type" : "short"
},
"group_id" : {
"type" : "long",
"store" : "yes"
},
"host" : {
"type" : "string",
"term_vector" : "with_positions_offsets"
},
"hosting_id" : {
"type" : "integer"
},
"name" : {
"type" : "string",
"boost" : 3.0,
"term_vector" : "with_positions_offsets"
},
"size" : {
"type" : "integer"
},
"source_title" : {
"type" : "string",
"term_vector" : "with_positions_offsets"
},
"source_url" : {
"type" : "string",
"term_vector" : "with_positions_offsets"
}
}
}
}
}
In this index there are around 30M records for both ElasticSearch and
Sphinx Search. Whether Sphinx on the same machine and single node makes
response for a MATCH query like MATCH('test') takes around 0.14s then in
ElasticSearch it's 2,3s. I'm surprised of these results. I'm not sure if
it's normal time for a query which does not reside in cache. I'm thinking
if there's any possibility to tune ES.
--