We have a use case to get the result based on custom sorting from Elasticsearch.
I am using Elasticsearch v 5.1.2.
Mapping
client_obj.indices.create(:index=>'test',:body=>{:mappings=>{:texts=>{:properties=>{:number=>{:type=>"integer"},:text=>{:type=>"text",:term_vector=>"with_positions_offsets_payloads"}}}}})
arr = [1,3,200,100,2,10 ...] # 1million entry
From array(arr), I am expecting results as the number ordered in an array from Elasticsearch. I used below API to get the results. It worked for the small set of numbers but if the array size is more than 500k then function block in API will increase and my ES server is going down.
from and size value will very based on page number and size
GET /test/_search
{
"query": {
"function_score": {
"boost_mode": "replace",
"query": {
"constant_score": {
"query": {
"bool": {
"must": [ { "terms": { "number" : [1,3,200,100,2] }},
{"query_string" : { "query" : "#{keyword}" ,"default_field" : "text"}}
]
}
}
}
},
"functions": [
{ "filter": { "term": { "number": 1 } }, "weight" : 4 },
{ "filter": { "term": { "number": 3 } }, "weight" : 3 },
{ "filter": { "term": { "number": 200 } }, "weight" : 2 },
{ "filter": { "term": { "number": 100 } }, "weight" : 1 },
{ "filter": { "term": { "number": 2 } }, "weight" : 0 }
]
}
},
"_source": ["number"],
"size": 3,
"from": 0
}
I am getting below error if the same API called for 500k numbers
[2017-03-30T09:01:08,898][INFO ][o.e.m.j.JvmGcMonitorService] [BYOiXkA] [gc][169] overhead, spent [269ms] collecting in the last [1s] [2017-03-30T09:01:14,604][WARN ][o.e.m.j.JvmGcMonitorService] [BYOiXkA] [gc][172] overhead, spent [3s] collecting in the last [3.6s] [2017-03-30T09:01:21,718][WARN ][o.e.m.j.JvmGcMonitorService] [BYOiXkA] [gc][176] overhead, spent [3.3s] collecting in the last [3.8s] [2017-03-30T09:01:30,561][INFO ][o.e.m.j.JvmGcMonitorService] [BYOiXkA] [gc][old][179][6] duration [6.1s], collections [1]/[6.5s], total [6.1s]/[12.5s], memory [2.9gb]->[2.8gb]/[2.9gb], all_pools {[young] [266.2mb]->[214mb]/[266.2mb]}{[survivor] [21.2mb]->[0b]/[33.2mb]}{[old] [2.6gb]->[2.6gb]/[2.6gb]} [2017-03-30T09:01:30,565][WARN ][o.e.m.j.JvmGcMonitorService] [BYOiXkA] [gc][179] overhead, spent [6.1s] collecting in the last [6.5s] [2017-03-30T09:01:37,033][INFO ][o.e.m.j.JvmGcMonitorService] [BYOiXkA] [gc][old][180][7] duration [5s], collections [1]/[5.5s], total [5s]/[17.6s], memory [2.8gb]->[2.9gb]/[2.9gb], all_pools {[young] [214mb]->[266.2mb]/[266.2mb]}{[survivor] [0b]->[1.2mb]/[33.2mb]}{[old] [2.6gb]->[2.6gb]/[2.6gb]} [2017-03-30T09:01:47,708][WARN ][o.e.m.j.JvmGcMonitorService] [BYOiXkA] [gc][183] overhead, spent [3s] collecting in the last [3s] [2017-03-30T09:01:49,939][WARN ][o.e.m.j.JvmGcMonitorService] [BYOiXkA] [gc][184] overhead, spent [2.2s] collecting in the last [2.2s] [2017-03-30T09:02:04,746][WARN ][o.e.m.j.JvmGcMonitorService] [BYOiXkA] [gc][185] overhead, spent [5.5s] collecting in the last [5.5s] [2017-03-30T09:02:24,145][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [] fatal error in thread [elasticsearch[BYOiXkA][search][T#4]], exiting java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.FixedBitSet.<init>(FixedBitSet.java:115) ~[lucene-core-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - 2016-11-02 19:47:11] at org.apache.lucene.util.DocIdSetBuilder.upgradeToBitSet(DocIdSetBuilder.java:235) ~[lucene-core-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - 2016-11-02 19:47:11] at org.apache.lucene.util.DocIdSetBuilder.grow(DocIdSetBuilder.java:178) ~[lucene-core-6.3.0.jar:6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - 2016-11-02 19:47:11]
I have following queries
Do we have any other way to solve this problem?
Can we use the script in ES API to solve this problem? if yes how to do that?
Please help me to solve this problem