Visualize:Bad Gateway error for long period searches on Dashboard when using Apache as a reverse proxy [SOLVED]

Here's a summary of my environment (ommiting some private info):

User -> Apache (port:443) -> Kibana (port:5601) -> Elasticsearch (port:9200)

Elasticsearch:

{
  "status" : 200,
  "version" : {
    "number" : "1.4.4",
    "build_hash" : "c88f77ffc81301dfa9dfd81ca2232f09588bd512",
    "build_timestamp" : "2015-02-19T13:05:36Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.3"
  },
  "tagline" : "You Know, for Search"
}

{
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 2703,
  "active_shards" : 5406,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0
}

Elasticsearch is ingesting ~120GB of logs/day.

Kibana:

Version	4.1.2
Build	7562
Commit SHA	34af16e

Apache:

Server version: Apache/2.4.6 (Red Hat)
Server built:   Dec  4 2014 03:30:10

In this setup, everything works fine, except when i try to select a time range longer than 24h on any dashboard. When i do so, selecting for exemple "this week" as the time range, i get the following error on kibana after ~ 2 minutes:

Error: Bad Gateway
    at respond (https://www-logs.prevnet/index.js?_b=7562:85289:15)
    at checkRespForFailure (https://www-logs.prevnet/index.js?_b=7562:85257:7)
    at https://www-logs.prevnet/index.js?_b=7562:83895:7
    at wrappedErrback (https://www-logs.prevnet/index.js?_b=7562:20902:78)
    at wrappedErrback (https://www-logs.prevnet/index.js?_b=7562:20902:78)
    at wrappedErrback (https://www-logs.prevnet/index.js?_b=7562:20902:78)
    at https://www-logs.prevnet/index.js?_b=7562:21035:76
    at Scope.$eval (https://www-logs.prevnet/index.js?_b=7562:22022:28)
    at Scope.$digest (https://www-logs.prevnet/index.js?_b=7562:21834:31)
    at Scope.$apply (https://www-logs.prevnet/index.js?_b=7562:22126:24)

On the apache log, i get the following message:

[Sat Nov 14 16:15:47.212271 2015] [proxy_http:error] [pid 1661:tid 140115036788480] (20014)Internal error: [client 10.10.10.10:38893] AH01102: error reading status line from remote server 127.0.0.1:5601, referer: https://www-logs.prevnet/
[Sat Nov 14 16:15:47.212385 2015] [proxy:error] [pid 1661:tid 140115036788480] [client 10.0.169.82:38893] AH00898: Error reading from remote server returned by /elasticsearch/_msearch, referer: https://www-logs.prevnet/

No errors are logged on the Elasticsearch servers.

Apache config:

<VirtualHost *:443>

    ServerName www-logs
    ServerAlias www-logs.prevnet

    DocumentRoot /var/www/www-logs
    ServerPath   /var/www/www-logs

    TimeOut 1200
    ProxyPreserveHost On
    ProxyRequests On
    ProxyVia On

    <Location "/">
      LogLevel warn
      ProxyPass           http://127.0.0.1:5601/  connectiontimeout=1200 timeout=1200 Keepalive=On
      ProxyPassReverse    http://127.0.0.1:5601/
      SetEnv proxy-initial-not-pooled
    </Location>

    LogLevel warn
    CustomLog /var/log/httpd/www-logs-ssl_access.log combined
    ErrorLog /var/log/httpd/www-logs-ssl_error.log

    SSLEngine On
    SSLProtocol all -SSLv2 -SSLv3
    SSLOptions +StdEnvVars +ExportCertData
    SSLProxyEngine On
    SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP:+eNULL:!MD5
    SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown
    ProxyPreserveHost On


    SSLVerifyDepth 10
    SSLCertificateFile      /path_to_my_certs/my_certfile.crt
    SSLCertificateKeyFile   /path_to_my_certs/my_keyfile.key
    SSLCertificateChainFile /path_to_my_certs/my_chainfile.crt
    SSLCACertificateFile    /path_to_my_certs/my_chainfile.crt
</VirtualHost>

SSLSessionCache         shmcb:/run/httpd/sslcache(512000)
SSLSessionCacheTimeout  5
SSLRandomSeed startup file:/dev/urandom  256
SSLRandomSeed connect builtin
SSLCryptoDevice builtin

BrowserMatch "MSIE [2-5]" \
  nokeepalive ssl-unclean-shutdown \
  downgrade-1.0 force-response-1.0

Kibana config:

port: 
host: ""
elasticsearch_url: "http://my_elasticsearch_server:9200/"
elasticsearch_preserve_host: false
kibana_index: ".kibana"
default_app_id: "dashboard/Default"
request_timeout: 300000
shard_timeout: 0
verify_ssl: true
pid_file: /var/run/kibana.pid
bundled_plugin_ids:
 - plugins/dashboard/index
 - plugins/discover/index
 - plugins/doc/index
 - plugins/kibana/index
 - plugins/markdown_vis/index
 - plugins/metric_vis/index
 - plugins/settings/index
 - plugins/table_vis/index
 - plugins/vis_types/index
 - plugins/visualize/index

As you can see on the above apache config, i've already tryed to increase the apache timeout without success.

Any ideas?

Thanks!

Obs: selecting a the same time period on the discovery tab works fine. The problem is only on the Dashboards tab

Does it happen without the proxy in place?

Hi Warkolm,

Accessing kibana directly, without apache, i get the following error message:

Error Error Connection Failure ErrorAbstract@http://127.0.0.1:5601/indexjs?_b=7562:83947:5
ConnectionFault@http://127.0.0.1:5601/indexjs?_b=7562:83961:3
checkRespForFailure@http://127.0.0.1:5601/index.js?_b=7562:85253:9
[24]</AngularConnectorprototypesequesUc@http://127.0.0.1:5601/index.js?_b=7562:83897:7
qFactory/defer/deferred.promise.then/wrappedErrback@http://127.0.0.1:5601/index.js?_b=7562:20902:15
qFactoiy/defer/deferred.promise.then/wrappedErrback@http://127.0.0.1:5601/index.js?_b=7562:20902:15
qFactoiy/defer/deferred.promise.then/wrappedErrback@http://127.0.0.1:5601/index.js?_b=7562:20902:15
qFactoiy/createlnternalRejectedPromisefr.thenfr@http://127.0.0.1:5601/index.js?_b=7562:21035:13
$RootScopeProvider/this3get</Scope.prototypeSeval@http://127.0.0.1:5601/index.js?_b=7562:22022:9
$RootScopeProvider/this3get</Scope.prototypeSdigest@http://127.0.0.1:5601/index.js?_b=7562:21834:15
$RootScopeProviderithis3get</Scope.prototype3apply@http://127.0.0.1:5601/index.js?_b=7562:22126:13
done@http://127.0.0.1:5601/indexjs?_b=7562:17661:34
completeRequest@http://127.0.0.1:5601/indexjs?_b=7562:17875:7
createHttpBackendfrixhionreadystatechange@http://127.0.0.1:5601/indexjs?_b=7562:17818:1

Is there anything in the ES logs?

Here's te ES logs from my 4 ES nodes during the operation:

node01:

[2015-11-17 13:47:54,739][WARN ][transport                ] [node01] Received response for a request that has timed out, sent [31228ms] ago, timed out [1218ms] ago, action [internal:discovery/zen/fd/master_ping], node [[node03][1Yiw5fU5SkSvx45UtiQsEw][node03][inet[/XX.XX.XXX.XX:9300]]], id [83895732]
[2015-11-17 13:48:55,444][INFO ][monitor.jvm              ] [node01] [gc][young][943454][242003] duration [879ms], collections [1]/[1.1s], total [879ms]/[3.9h], memory [23.5gb]->[23.5gb]/[29.2gb], all_pools {[young] [471.9mb]->[1.2mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old] [23gb]->[23.4gb]/[28.6gb]}
[2015-11-17 13:49:10,279][WARN ][transport                ] [node01] Received response for a request that has timed out, sent [34167ms] ago, timed out [4167ms] ago, action [internal:discovery/zen/fd/master_ping], node [[node03][1Yiw5fU5SkSvx45UtiQsEw][node03][inet[/XX.XX.XXX.XX:9300]]], id [83900626]
[2015-11-17 13:49:56,753][WARN ][monitor.jvm              ] [node01] [gc][old][943483][5675] duration [32.3s], collections [2]/[32.6s], total [32.3s]/[3h], memory [29gb]->[24.2gb]/[29.2gb], all_pools {[young] [500.7mb]->[53.8mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [28.5gb]->[24.1gb]/[28.6gb]}
[2015-11-17 13:51:06,789][WARN ][monitor.jvm              ] [node01] [gc][old][943522][5677] duration [30.8s], collections [2]/[31.5s], total [30.8s]/[3h], memory [28.9gb]->[23.5gb]/[29.2gb], all_pools {[young] [312.8mb]->[225.5mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [28.5gb]->[23.3gb]/[28.6gb]}

node02:

[2015-11-17 13:46:39,149][INFO ][monitor.jvm              ] [node02] [gc][young][389953][106940] duration [841ms], collections [1]/[1.8s], total [841ms]/[1.6h], memory [22.2gb]->[22.7gb]/[29.2gb], all_pools {[young] [4.8mb]->[229.2mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old] [22.1gb]->[22.4gb]/[28.6gb]}
[2015-11-17 13:46:42,171][WARN ][monitor.jvm              ] [node02] [gc][young][389954][106941] duration [2.1s], collections [1]/[3s], total [2.1s]/[1.6h], memory [22.7gb]->[23gb]/[29.2gb], all_pools {[young] [229.2mb]->[93.8mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old] [22.4gb]->[22.8gb]/[28.6gb]}
[2015-11-17 13:47:38,079][WARN ][monitor.jvm              ] [node02] [gc][old][389977][2510] duration [33.1s], collections [1]/[33.6s], total [33.1s]/[38.4m], memory [28.6gb]->[23.2gb]/[29.2gb], all_pools {[young] [251.6mb]->[30.2mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [28.3gb]->[23.2gb]/[28.6gb]}
[2015-11-17 13:49:10,295][WARN ][transport                ] [node02] Received response for a request that has timed out, sent [34293ms] ago, timed out [4280ms] ago, action [internal:discovery/zen/fd/master_ping], node [[node03][1Yiw5fU5SkSvx45UtiQsEw][node03][inet[/XX.XX.XXX.XX:9300]]], id [35701756]
[2015-11-17 13:51:45,423][DEBUG][action.search.type       ] [node02] [93596] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context found for id [93596]
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:508)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:478)
        at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(SearchServiceTransportAction.java:452)
        at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(SearchServiceTransportAction.java:449)
        at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
[2015-11-17 13:51:45,423][DEBUG][action.search.type       ] [node02] [93600] Failed to execute fetch phase
org.elasticsearch.search.SearchContextMissingException: No search context found for id [93600]
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:508)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:478)
        at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(SearchServiceTransportAction.java:452)
        at org.elasticsearch.search.action.SearchServiceTransportAction$17.call(SearchServiceTransportAction.java:449)
        at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

...cont...

...cont...
node03:

[2015-11-17 13:49:10,354][WARN ][monitor.jvm              ] [node02] [gc][old][397024][2843] duration [34.6s], collections [2]/[35.5s], total [34.6s]/[53.4m], memory [28.6gb]->[23.2gb]/[29.2gb], all_pools {[young] [166.2mb]->[59.4mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [28.4gb]->[23.1gb]/[28.6gb]}
[2015-11-17 13:49:55,485][DEBUG][action.admin.cluster.node.stats] [node02] failed to execute on node [_ycxwuOeT2qVjkAVBIG0aw]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [node01.prevnet][inet[node01.prevnet/XX.XX.XXX.XX:9300]][cluster:monitor/nodes/stats[n]] request_id [117445099] timed out after [15000ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
[2015-11-17 13:49:56,686][WARN ][transport                ] [node02] Received response for a request that has timed out, sent [16201ms] ago, timed out [1201ms] ago, action [cluster:monitor/nodes/stats[n]], node [[node01.prevnet][_ycxwuOeT2qVjkAVBIG0aw][node01][inet[node01.prevnet/XX.XX.XXX.XX:9300]]], id [117445099]
[2015-11-17 13:49:56,763][WARN ][transport                ] [node02] Received response for a request that has timed out, sent [31591ms] ago, timed out [1589ms] ago, action [internal:discovery/zen/fd/ping], node [[node01.prevnet][_ycxwuOeT2qVjkAVBIG0aw][node01][inet[node01.prevnet/XX.XX.XXX.XX:9300]]], id [117445019]
[2015-11-17 13:50:55,508][DEBUG][action.admin.cluster.node.stats] [node02] failed to execute on node [_ycxwuOeT2qVjkAVBIG0aw]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [node01.prevnet][inet[node01.prevnet/XX.XX.XXX.XX:9300]][cluster:monitor/nodes/stats[n]] request_id [117461901] timed out after [15003ms]
        at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:366)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
[2015-11-17 13:51:06,526][WARN ][transport                ] [node02] Received response for a request that has timed out, sent [26021ms] ago, timed out [11018ms] ago, action [cluster:monitor/nodes/stats[n]], node [[node01.prevnet][_ycxwuOeT2qVjkAVBIG0aw][node01][inet[node01.prevnet/XX.XX.XXX.XX:9300]]], id [117461901]

...cont...

...cont...
node04:

[2015-11-17 13:45:03,868][INFO ][monitor.jvm              ] [node04] [gc][young][256053][71959] duration [739ms], collections [1]/[1.3s], total [739ms]/[1.1h], memory [22.5gb]->[22.8gb]/[29.2gb], all_pools {[young] [132.4mb]->[171.2mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old] [22.3gb]->[22.6gb]/[28.6gb]}
[2015-11-17 13:46:01,678][WARN ][monitor.jvm              ] [node04] [gc][old][256080][1269] duration [30.3s], collections [2]/[31.1s], total [30.3s]/[14.3m], memory [28.5gb]->[22.1gb]/[29.2gb], all_pools {[young] [106mb]->[36.3mb]/[532.5mb]}{[survivor] [66.5mb]->[0b]/[66.5mb]}{[old] [28.3gb]->[22gb]/[28.6gb]}
[2015-11-17 13:47:02,683][WARN ][transport                ] [node04] Received response for a request that has timed out, sent [30873ms] ago, timed out [857ms] ago, action [internal:discovery/zen/fd/master_ping], node [[node03][1Yiw5fU5SkSvx45UtiQsEw][node02][inet[/XX.XX.XXX.XX:9300]]], id [23560237]
[2015-11-17 13:48:18,348][WARN ][transport                ] [node04] Received response for a request that has timed out, sent [34695ms] ago, timed out [4692ms] ago, action [internal:discovery/zen/fd/master_ping], node [[node03][1Yiw5fU5SkSvx45UtiQsEw][node02][inet[/XX.XX.XXX.XX:9300]]], id [23566912]
[2015-11-17 13:50:53,637][DEBUG][action.search.type       ] [node04] [93668] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [node02][inet[/XX.XX.XXX.XX:9300]][indices:data/read/search[phase/fetch/id]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [93668]
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:508)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:478)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:868)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:862)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
[2015-11-17 13:52:03,881][DEBUG][action.search.type       ] [node04] [92767] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [node03][inet[/XX.XX.XXX.XX:9300]][indices:data/read/search[phase/fetch/id]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [92767]
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:508)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:478)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:868)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:862)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
[2015-11-17 13:52:07,756][DEBUG][action.search.type       ] [node04] [92774] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [node03][inet[/XX.XX.XXX.XX:9300]][indices:data/read/search[phase/fetch/id]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [92774]
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:508)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:478)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:868)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:862)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

...cont...

...cont node 04...

[2015-11-17 13:52:07,758][DEBUG][action.search.type       ] [node04] [92791] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [node03][inet[/XX.XX.XXX.XX:9300]][indices:data/read/search[phase/fetch/id]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [92791]
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:508)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:478)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:868)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:862)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
[2015-11-17 13:54:52,049][DEBUG][action.search.type       ] [node04] [93765] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [node02][inet[/XX.XX.XXX.XX:9300]][indices:data/read/search[phase/fetch/id]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [93765]
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:508)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:478)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:868)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:862)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
[2015-11-17 13:55:41,267][DEBUG][action.search.type       ] [node04] [92905] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [node03][inet[/XX.XX.XXX.XX:9300]][indices:data/read/search[phase/fetch/id]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [92905]
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:508)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:478)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:868)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:862)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
[2015-11-17 13:55:41,267][DEBUG][action.search.type       ] [node04] [92884] Failed to execute fetch phase
org.elasticsearch.transport.RemoteTransportException: [node03][inet[/XX.XX.XXX.XX:9300]][indices:data/read/search[phase/fetch/id]]
Caused by: org.elasticsearch.search.SearchContextMissingException: No search context found for id [92884]
        at org.elasticsearch.search.SearchService.findContext(SearchService.java:508)
        at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:478)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:868)
        at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:862)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

Looking at the timeouts and the GC periods, I'd say your nodes are overloaded.

Thanks! Working to put more nodes on my cluster to see if we can get better performance.

After add 6 more data nodes to the cluster the problem was solved! Thanks a lot for your help @warkolm !

1 Like