Thanks for the answers so far. If I understand this correctly the main
reason for the the time consuming queries is the large amount of terms
targeted and not the actual hit count, correct?
I understand ES 0.21 has some new features thanks to Lucene 4, but what can
be done now? More nodes, more or less number of shards, split up the index,
update the mappings?
Here are some queries end their execution time, and our mappings:
http://ourbox:9200/fruit/_search?q=ES_VALUE:201* - 14s
http://ourbox:9200/fruit/_search?q=ES_VALUE:201206* - 600ms
http://ourbox:9200/fruit/_search?q=ES_VALUE:20120625XY123456* - 100ms
*
*
{
"fruit" : {
"Banana" : {
"_all" : {
"enabled" : false
},
"_source" : {
"excludes" : [ "ES_VALUE" ]
},
"properties" : {
"ES_DISTRIBUTOR" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_OWNER" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_SUMMARY" : {
"type" : "string",
"index" : "no",
"store" : "yes"
},
"ES_VALUE" : {
"type" : "string",
"analyzer" : "fruitSimple"
}
}
},
"Apple" : {
"_all" : {
"enabled" : false
},
"_source" : {
"excludes" : [ "ES_VALUE" ]
},
"properties" : {
"ES_DISTRIBUTOR" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_OWNER" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_SUMMARY" : {
"type" : "string",
"index" : "no",
"store" : "yes"
},
"ES_VALUE" : {
"type" : "string",
"analyzer" : "fruitSimple"
}
}
},
"Pineapple" : {
"_all" : {
"enabled" : false
},
"_source" : {
"excludes" : [ "ES_VALUE" ]
},
"properties" : {
"ES_DISTRIBUTOR" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_OWNER" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_SUMMARY" : {
"type" : "string",
"index" : "no",
"store" : "yes"
},
"ES_VALUE" : {
"type" : "string",
"analyzer" : "fruitSimple"
}
}
},
"Melon" : {
"_all" : {
"enabled" : false
},
"_source" : {
"excludes" : [ "ES_VALUE" ]
},
"properties" : {
"ES_DISTRIBUTOR" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_OWNER" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_SUMMARY" : {
"type" : "string",
"index" : "no",
"store" : "yes"
},
"ES_VALUE" : {
"type" : "string",
"analyzer" : "fruitSimple"
}
}
},
"Grapefruit" : {
"_all" : {
"enabled" : false
},
"_source" : {
"excludes" : [ "ES_VALUE" ]
},
"properties" : {
"ES_DISTRIBUTOR" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_OWNER" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_SUMMARY" : {
"type" : "string",
"index" : "no",
"store" : "yes"
},
"ES_VALUE" : {
"type" : "string",
"analyzer" : "fruitSimple"
},
}
},
"Eggplant" : {
"_all" : {
"enabled" : false
},
"_source" : {
"excludes" : [ "ES_VALUE" ]
},
"properties" : {
"ES_DISTRIBUTOR" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_OWNER" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_SUMMARY" : {
"type" : "string",
"index" : "no",
"store" : "yes"
},
"ES_VALUE" : {
"type" : "string",
"analyzer" : "fruitSimple"
}
}
},
"Kiwi" : {
"_all" : {
"enabled" : false
},
"_source" : {
"excludes" : [ "ES_VALUE" ]
},
"properties" : {
"ES_DISTRIBUTOR" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_OWNER" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_SUMMARY" : {
"type" : "string",
"index" : "no",
"store" : "yes"
},
"ES_VALUE" : {
"type" : "string",
"analyzer" : "fruitSimple"
}
}
},
"Orange" : {
"_all" : {
"enabled" : false
},
"_source" : {
"excludes" : [ "ES_VALUE" ]
},
"properties" : {
"ES_DISTRIBUTOR" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_OWNER" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_SUMMARY" : {
"type" : "string",
"index" : "no",
"store" : "yes"
},
"ES_VALUE" : {
"type" : "string",
"analyzer" : "fruitSimple"
}
}
},
"Lemon" : {
"_all" : {
"enabled" : false
},
"_source" : {
"excludes" : [ "ES_VALUE" ]
},
"properties" : {
"ES_DISTRIBUTOR" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_OWNER" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_SUMMARY" : {
"type" : "string",
"index" : "no",
"store" : "yes"
},
"ES_VALUE" : {
"type" : "string",
"analyzer" : "fruitSimple"
}
}
},
"Lime" : {
"_all" : {
"enabled" : false
},
"_source" : {
"excludes" : [ "ES_VALUE" ]
},
"properties" : {
"ES_DISTRIBUTOR" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_OWNER" : {
"type" : "string",
"index" : "not_analyzed",
"omit_norms" : true,
"omit_term_freq_and_positions" : true
},
"ES_SUMMARY" : {
"type" : "string",
"index" : "no",
"store" : "yes"
},
"ES_VALUE" : {
"type" : "string",
"analyzer" : "fruitSimple"
}
}
},
}
}
On Wednesday, November 14, 2012 11:23:05 PM UTC+1, lifo wrote:
We have a simple setup with 1 node, 1 index and 5 shards containing
totally about 110 million documents. Total index size is about 30Gb. All
documents are simply indexed by one field named ES_VALUE and stores a value
named ES_SUMMARY, in addition we have some other id fields indexed. We are
indexing several fields from our entities but they are all put into
ES_VALUE when indexing, i.e. there are a lot more different values for
ES_VALUE than there are documents in the index.
A simple query for documents only using a wildcard query on ES_VALUE
performs very different depending on how many documents its targeting. I.e.
when the result has a smaller total hit count it returns in a few mills but
when result has a large total hit count it takes about 10-15sec to return.
Size is set to 15 so there are only 15 documents actually returned to the
client.
Any suggestions on how to get the queris to perform better with large
resultsets?
--