Hi all,
I have searched through the forum and found 'similar' issues but no solution to an issue that I am having with a new Timelion ES search which I created. What happens is the search causes one of 2 errors. Either a Java Stack Trace or it complains about hitting the threadpool limit of 1000 and appears to reject any over... The elastic search cluster logs increase by about 200Mb each time the search is run. However if I set the search to run lets say every 5 mins it sometimes completes and sometimes doesn't.
I have monitored the vmstat on the es nodes as well as the kibana node running from.. CPU does go below 95% idle... I don't think this a search that es should NOT be able to handle.
I did increase the kibana timeout for the search to complete to 40000ms but still it gives me one of the 2 errors. These are also attached below along with the search. If you require the cluster logs I can supply.
Many Thanks in advance
Paul
Infrastructure is:
2 x vCPU (Xeon E5-2680 v4) using x2 cores
12GB Memory
20Gb systems disk
200Gb data disk
10Gb Network (Teamed)
ELK version 5.4
3 Elasticsearch Nodes
3 Logstash Nodes
2 Kibana Nodes
(1 archive 30 day curator)
~ 105Gb data in the index at about 3-4gb p/d
Search created:
`.es('message:("Model UKMO_EURO4 Run") AND (Machine:FGBW1APWTRB001 OR Machine:FGBW1APWTRB018)').label('Model EURO4 Released (0, 6, 12, 18 hrs)').bars(), .es('message:("Model UKMO_UKV Run") AND (Machine:FGBW1APWTRB001 OR Machine:FGBW1APWTRB018)').label('Model UKMO UKV Released (0, 3, 6, 9, 12, 15, 18, 21 hrs)').bars(), .es('message:("Model UKMO_global Run") AND (Machine:FGBW1APWTRB001 OR Machine:FGBW1APWTRB018)').label('Model UKMO global Released (0, 12 hrs)').bars(), .es('message:("Model NCEP_GFS Run") AND (Machine:FGBW1APWTRB001 OR Machine:FGBW1APWTRB018)').label('Model NCEP GFS Released (0, 6, 12, 18 hrs) ').bars(), .es('message:("Model UKMO_global_0.234x0.156_UK Run") AND (Machine:FGBW1APWTRB001 OR Machine:FGBW1APWTRB018)').label('Model UKMO global 0.234x0.156 UK Released (0, 12 hrs)').bars(), .es('message:("Model ECMWF_0.125_UK Run") AND (Machine:FGBW1APWTRB001 OR Machine:FGBW1APWTRB018)').label('Model ECMWF 0.125 UK Released(0, 12 hrs)').bars(), .es('message:("Model ECMWF_0.2_UK Run") AND (Machine:FGBW1APWTRB001 OR Machine:FGBW1APWTRB018)').label('Model ECMWF 0.2 UK Released (0, 12 hrs)').bars() .es('message:("Model ECMWF_HRES_SURF Run") AND (Machine:FGBW1APWTRB001 OR Machine:FGBW1APWTRB018)').label('Model ECMWF HRES SURF Released (0, 12 hrs)').bars(), .es('message:("Model ECMWF_HRES_HIGH Run") AND (Machine:FGBW1APWTRB001 OR Machine:FGBW1APWTRB018)').label('Model ECMWF HRES HIGH Released (0, 12 hrs)').bars()`
Here are the 2 errors:
Timelion: Error: in cell #1: [search_phase_execution_exception]
Error: in cell #1: [search_phase_execution_exception]
at throwWithCell (/usr/share/kibana/src/core_plugins/timelion/server/handlers/chain_runner.js:45:11)
at /usr/share/kibana/src/core_plugins/timelion/server/handlers/chain_runner.js:175:13
at arrayEach (/usr/share/kibana/node_modules/lodash/index.js:1289:13)
at Function.<anonymous> (/usr/share/kibana/node_modules/lodash/index.js:3345:13)
at /usr/share/kibana/src/core_plugins/timelion/server/handlers/chain_runner.js:167:24
at bound (domain.js:280:14)
at runBound (domain.js:293:12)
at tryCatcher (/usr/share/kibana/node_modules/bluebird/js/main/util.js:26:23)
at Promise._settlePromiseFromHandler (/usr/share/kibana/node_modules/bluebird/js/main/promise.js:503:31)
at Promise._settlePromiseAt (/usr/share/kibana/node_modules/bluebird/js/main/promise.js:577:18)
at Promise._settlePromises (/usr/share/kibana/node_modules/bluebird/js/main/promise.js:693:14)
at Async._drainQueue (/usr/share/kibana/node_modules/bluebird/js/main/async.js:123:16)
at Async._drainQueues (/usr/share/kibana/node_modules/bluebird/js/main/async.js:133:10)
at Immediate.Async.drainQueues (/usr/share/kibana/node_modules/bluebird/js/main/async.js:15:14)
at runCallback (timers.js:666:20)
at tryOnImmediate (timers.js:639:5)
Error 2:
` Timelion: Error: in cell #1: [es_rejected_execution_exception] rejected execution of org.elasticsearch.transport.TransportService$7@19154b69 on EsThreadPoolExecutor[search, queue capacity = 1000, …`