Request rejected(400) without details(root-cause) in 6.2.4

Hi, how do I troubleshoot rejected request ElasticSearch 6.2.4?

The request got

Error: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]

As the returned but there is no detailed logs anywhere that I can debug what's wrong with the request.

Here is my logs

I know in some other ES version there are a field "root_cause" contains details about why request is invalid. Is that not available in 6.2.4?

Any help is appreciated. Thanks

Context of the problem: https://uber-cadence.slack.com/archives/CL22WDF70/p1611624844015500?thread_ts=1611505487.003500&cid=CL22WDF70

Welcome to our community! :smiley:

Can you please post the logs somewhere accessible, like gist/pastebin/etc.

That's all the log -- seems nothing interesting.

[2021-01-26T01:24:00,877][INFO ][o.e.n.Node               ] [elasticsearch-0] initializing ...
[2021-01-26T01:24:01,105][INFO ][o.e.e.NodeEnvironment    ] [elasticsearch-0] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/nvme1n1)]], net usable_space [9.7gb], net total_space [9.7gb], types [ext4]
[2021-01-26T01:24:01,106][INFO ][o.e.e.NodeEnvironment    ] [elasticsearch-0] heap size [3.9gb], compressed ordinary object pointers [true]
[2021-01-26T01:24:01,192][INFO ][o.e.n.Node               ] [elasticsearch-0] node name [elasticsearch-0], node ID [3dYTfC2ZSKOcyc2HZNSztg]
[2021-01-26T01:24:01,192][INFO ][o.e.n.Node               ] [elasticsearch-0] version[6.2.4], pid[1], build[ccec39f/2018-04-12T20:37:28.497551Z], OS[Linux/5.4.0-1029-aws/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_161/25.161-b14]
[2021-01-26T01:24:01,193][INFO ][o.e.n.Node               ] [elasticsearch-0] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.0B40p2jH, -XX:+HeapDumpOnOutOfMemoryError, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -Xloggc:logs/gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -Des.cgroups.hierarchy.override=/, -Xms4g, -Xmx4g, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config]
[2021-01-26T01:24:04,876][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [aggs-matrix-stats]
[2021-01-26T01:24:04,876][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [analysis-common]
[2021-01-26T01:24:04,876][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [ingest-common]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [lang-expression]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [lang-mustache]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [lang-painless]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [mapper-extras]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [parent-join]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [percolator]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [rank-eval]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [reindex]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [repository-url]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [transport-netty4]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [tribe]
[2021-01-26T01:24:04,878][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded plugin [ingest-geoip]
[2021-01-26T01:24:04,878][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded plugin [ingest-user-agent]
[2021-01-26T01:24:19,600][INFO ][o.e.d.DiscoveryModule    ] [elasticsearch-0] using discovery type [zen]
[2021-01-26T01:24:19,678][DEBUG][o.e.d.z.UnicastZenPing   ] [elasticsearch-0] using initial hosts [elasticsearch-0.es.default.svc.cluster.local], with concurrent_connects [10], resolve_timeout [5s]
[2021-01-26T01:24:19,682][DEBUG][o.e.d.z.ElectMasterService] [elasticsearch-0] using minimum_master_nodes [1]
[2021-01-26T01:24:19,682][DEBUG][o.e.d.z.ZenDiscovery     ] [elasticsearch-0] using ping_timeout [3s], join.timeout [1m], master_election.ignore_non_master [false]
[2021-01-26T01:24:19,687][DEBUG][o.e.d.z.MasterFaultDetection] [elasticsearch-0] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2021-01-26T01:24:19,690][DEBUG][o.e.d.z.NodesFaultDetection] [elasticsearch-0] [node  ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2021-01-26T01:24:22,393][INFO ][o.e.n.Node               ] [elasticsearch-0] initialized
[2021-01-26T01:24:22,393][INFO ][o.e.n.Node               ] [elasticsearch-0] starting ...
[2021-01-26T01:24:23,397][INFO ][o.e.t.TransportService   ] [elasticsearch-0] publish_address {100.96.2.210:9300}, bound_addresses {0.0.0.0:9300}
[2021-01-26T01:24:23,575][INFO ][o.e.b.BootstrapChecks    ] [elasticsearch-0] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2021-01-26T01:24:24,397][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-0] [gc][2] overhead, spent [297ms] collecting in the last [1s]
[2021-01-26T01:24:26,985][DEBUG][o.e.d.z.ZenDiscovery     ] [elasticsearch-0] filtered ping responses: (ignore_non_masters [false])
	--> ping_response{node [{elasticsearch-0}{3dYTfC2ZSKOcyc2HZNSztg}{eToYT38_QAiMEE5lFDwLMA}{100.96.2.210}{100.96.2.210:9300}], id[7], master [null],cluster_state_version [-1], cluster_name[k8s-mission-control.mc.res0.net]}
[2021-01-26T01:24:26,986][DEBUG][o.e.d.z.ZenDiscovery     ] [elasticsearch-0] elected as master, waiting for incoming joins ([0] needed)
[2021-01-26T01:24:27,009][INFO ][o.e.c.s.MasterService    ] [elasticsearch-0] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {elasticsearch-0}{3dYTfC2ZSKOcyc2HZNSztg}{eToYT38_QAiMEE5lFDwLMA}{100.96.2.210}{100.96.2.210:9300}
[2021-01-26T01:24:27,013][DEBUG][o.e.d.z.ZenDiscovery     ] [elasticsearch-0] got first state from fresh master [3dYTfC2ZSKOcyc2HZNSztg]
[2021-01-26T01:24:27,015][INFO ][o.e.c.s.ClusterApplierService] [elasticsearch-0] new_master {elasticsearch-0}{3dYTfC2ZSKOcyc2HZNSztg}{eToYT38_QAiMEE5lFDwLMA}{100.96.2.210}{100.96.2.210:9300}, reason: apply cluster state (from master [master {elasticsearch-0}{3dYTfC2ZSKOcyc2HZNSztg}{eToYT38_QAiMEE5lFDwLMA}{100.96.2.210}{100.96.2.210:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2021-01-26T01:24:27,185][INFO ][o.e.h.n.Netty4HttpServerTransport] [elasticsearch-0] publish_address {100.96.2.210:9200}, bound_addresses {0.0.0.0:9200}
[2021-01-26T01:24:27,185][INFO ][o.e.n.Node               ] [elasticsearch-0] started
[2021-01-26T01:24:28,002][INFO ][o.e.g.GatewayService     ] [elasticsearch-0] recovered [2] indices into cluster_state
[2021-01-26T01:24:29,606][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-0] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[cadence-visibility-dev][4]] ...]).
[2021-01-26T01:26:10,476][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-0] [gc][108] overhead, spent [382ms] collecting in the last [1s]

FYI 6.2 is EOL, you should upgrade as a matter of urgency.

Where did this 400 come from exactly?

It's from Cadence service https://github.com/uber/cadence/blob/0.10.3/common/elasticsearch/client.go#L101 calling via the elastic go library.
Cadence is an orchestration/workflow engine. It's using ES for searching workflows.

This happened when a Cadence user upgraded their Cadence service from 0.10.3 to 0.15.1, there should be no index/schema change but somehow the request started to be rejected by ES. After deleting and recreate the index, the issue resolved.

I was hoping to get some logs from ES to understand what's wrong with the request since it's 400. But there was nothing I can get from. For the same error returned I have seen error logs in ES server like this with root_cause field from other users(when used 6.8):

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [RunID] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"cadence-visibility-dev","node":"wLpGV-fdRFWaEz2yzPBsSw","reason":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [RunID] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [RunID] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.","caused_by":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [RunID] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}},"status":400}

Not sure why ES 6.2 doesn't have it. Is this log/field introduced later after 6.2?

And I agreed that the ES should be upgraded to at least 6.8. In fact, the minimum version that we have tested is 6.5 with Cadence.

Thanks for your reply!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.