Request rejected(400) without details(root-cause) in 6.2.4

Hi, how do I troubleshoot rejected request ElasticSearch 6.2.4?

The request got

Error: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]

As the returned but there is no detailed logs anywhere that I can debug what's wrong with the request.

Here is my logs

I know in some other ES version there are a field "root_cause" contains details about why request is invalid. Is that not available in 6.2.4?

Any help is appreciated. Thanks

Context of the problem: https://uber-cadence.slack.com/archives/CL22WDF70/p1611624844015500?thread_ts=1611505487.003500&cid=CL22WDF70

Welcome to our community! :smiley:

Can you please post the logs somewhere accessible, like gist/pastebin/etc.

That's all the log -- seems nothing interesting.

[2021-01-26T01:24:00,877][INFO ][o.e.n.Node               ] [elasticsearch-0] initializing ...
[2021-01-26T01:24:01,105][INFO ][o.e.e.NodeEnvironment    ] [elasticsearch-0] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/nvme1n1)]], net usable_space [9.7gb], net total_space [9.7gb], types [ext4]
[2021-01-26T01:24:01,106][INFO ][o.e.e.NodeEnvironment    ] [elasticsearch-0] heap size [3.9gb], compressed ordinary object pointers [true]
[2021-01-26T01:24:01,192][INFO ][o.e.n.Node               ] [elasticsearch-0] node name [elasticsearch-0], node ID [3dYTfC2ZSKOcyc2HZNSztg]
[2021-01-26T01:24:01,192][INFO ][o.e.n.Node               ] [elasticsearch-0] version[6.2.4], pid[1], build[ccec39f/2018-04-12T20:37:28.497551Z], OS[Linux/5.4.0-1029-aws/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_161/25.161-b14]
[2021-01-26T01:24:01,193][INFO ][o.e.n.Node               ] [elasticsearch-0] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch.0B40p2jH, -XX:+HeapDumpOnOutOfMemoryError, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -Xloggc:logs/gc.log, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=32, -XX:GCLogFileSize=64m, -Des.cgroups.hierarchy.override=/, -Xms4g, -Xmx4g, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config]
[2021-01-26T01:24:04,876][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [aggs-matrix-stats]
[2021-01-26T01:24:04,876][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [analysis-common]
[2021-01-26T01:24:04,876][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [ingest-common]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [lang-expression]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [lang-mustache]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [lang-painless]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [mapper-extras]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [parent-join]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [percolator]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [rank-eval]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [reindex]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [repository-url]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [transport-netty4]
[2021-01-26T01:24:04,877][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded module [tribe]
[2021-01-26T01:24:04,878][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded plugin [ingest-geoip]
[2021-01-26T01:24:04,878][INFO ][o.e.p.PluginsService     ] [elasticsearch-0] loaded plugin [ingest-user-agent]
[2021-01-26T01:24:19,600][INFO ][o.e.d.DiscoveryModule    ] [elasticsearch-0] using discovery type [zen]
[2021-01-26T01:24:19,678][DEBUG][o.e.d.z.UnicastZenPing   ] [elasticsearch-0] using initial hosts [elasticsearch-0.es.default.svc.cluster.local], with concurrent_connects [10], resolve_timeout [5s]
[2021-01-26T01:24:19,682][DEBUG][o.e.d.z.ElectMasterService] [elasticsearch-0] using minimum_master_nodes [1]
[2021-01-26T01:24:19,682][DEBUG][o.e.d.z.ZenDiscovery     ] [elasticsearch-0] using ping_timeout [3s], join.timeout [1m], master_election.ignore_non_master [false]
[2021-01-26T01:24:19,687][DEBUG][o.e.d.z.MasterFaultDetection] [elasticsearch-0] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2021-01-26T01:24:19,690][DEBUG][o.e.d.z.NodesFaultDetection] [elasticsearch-0] [node  ] uses ping_interval [1s], ping_timeout [30s], ping_retries [3]
[2021-01-26T01:24:22,393][INFO ][o.e.n.Node               ] [elasticsearch-0] initialized
[2021-01-26T01:24:22,393][INFO ][o.e.n.Node               ] [elasticsearch-0] starting ...
[2021-01-26T01:24:23,397][INFO ][o.e.t.TransportService   ] [elasticsearch-0] publish_address {100.96.2.210:9300}, bound_addresses {0.0.0.0:9300}
[2021-01-26T01:24:23,575][INFO ][o.e.b.BootstrapChecks    ] [elasticsearch-0] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2021-01-26T01:24:24,397][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-0] [gc][2] overhead, spent [297ms] collecting in the last [1s]
[2021-01-26T01:24:26,985][DEBUG][o.e.d.z.ZenDiscovery     ] [elasticsearch-0] filtered ping responses: (ignore_non_masters [false])
	--> ping_response{node [{elasticsearch-0}{3dYTfC2ZSKOcyc2HZNSztg}{eToYT38_QAiMEE5lFDwLMA}{100.96.2.210}{100.96.2.210:9300}], id[7], master [null],cluster_state_version [-1], cluster_name[k8s-mission-control.mc.res0.net]}
[2021-01-26T01:24:26,986][DEBUG][o.e.d.z.ZenDiscovery     ] [elasticsearch-0] elected as master, waiting for incoming joins ([0] needed)
[2021-01-26T01:24:27,009][INFO ][o.e.c.s.MasterService    ] [elasticsearch-0] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {elasticsearch-0}{3dYTfC2ZSKOcyc2HZNSztg}{eToYT38_QAiMEE5lFDwLMA}{100.96.2.210}{100.96.2.210:9300}
[2021-01-26T01:24:27,013][DEBUG][o.e.d.z.ZenDiscovery     ] [elasticsearch-0] got first state from fresh master [3dYTfC2ZSKOcyc2HZNSztg]
[2021-01-26T01:24:27,015][INFO ][o.e.c.s.ClusterApplierService] [elasticsearch-0] new_master {elasticsearch-0}{3dYTfC2ZSKOcyc2HZNSztg}{eToYT38_QAiMEE5lFDwLMA}{100.96.2.210}{100.96.2.210:9300}, reason: apply cluster state (from master [master {elasticsearch-0}{3dYTfC2ZSKOcyc2HZNSztg}{eToYT38_QAiMEE5lFDwLMA}{100.96.2.210}{100.96.2.210:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2021-01-26T01:24:27,185][INFO ][o.e.h.n.Netty4HttpServerTransport] [elasticsearch-0] publish_address {100.96.2.210:9200}, bound_addresses {0.0.0.0:9200}
[2021-01-26T01:24:27,185][INFO ][o.e.n.Node               ] [elasticsearch-0] started
[2021-01-26T01:24:28,002][INFO ][o.e.g.GatewayService     ] [elasticsearch-0] recovered [2] indices into cluster_state
[2021-01-26T01:24:29,606][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-0] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[cadence-visibility-dev][4]] ...]).
[2021-01-26T01:26:10,476][INFO ][o.e.m.j.JvmGcMonitorService] [elasticsearch-0] [gc][108] overhead, spent [382ms] collecting in the last [1s]

FYI 6.2 is EOL, you should upgrade as a matter of urgency.

Where did this 400 come from exactly?

It's from Cadence service https://github.com/uber/cadence/blob/0.10.3/common/elasticsearch/client.go#L101 calling via the elastic go library.
Cadence is an orchestration/workflow engine. It's using ES for searching workflows.

This happened when a Cadence user upgraded their Cadence service from 0.10.3 to 0.15.1, there should be no index/schema change but somehow the request started to be rejected by ES. After deleting and recreate the index, the issue resolved.

I was hoping to get some logs from ES to understand what's wrong with the request since it's 400. But there was nothing I can get from. For the same error returned I have seen error logs in ES server like this with root_cause field from other users(when used 6.8):

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [RunID] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"cadence-visibility-dev","node":"wLpGV-fdRFWaEz2yzPBsSw","reason":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [RunID] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}],"caused_by":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [RunID] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.","caused_by":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [RunID] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."}}},"status":400}

Not sure why ES 6.2 doesn't have it. Is this log/field introduced later after 6.2?

And I agreed that the ES should be upgraded to at least 6.8. In fact, the minimum version that we have tested is 6.5 with Cadence.

Thanks for your reply!