Hive select from ES on dynamic index failing: all nodes failed

Keith.Hinde · May 4, 2016, 9:07am

Hive: v1.2.1.2.3.2.0-2950
ES-Hadoop: velasticsearch-hadoop-2.3.0.jar
ES: v2.1.1

So, we have a three node cluster with a dynamically named index - e.g. 001, 002, 003 - each of which holds data related to a subset of customers. The create external table statement includes the following property:

TBLPROPERTIES('es.resource' = '{customer_partition}/customer_mapping');

Now, writing to this external table (INSERT/UPDATE) from Hive works well, but any attempt to read (SELECT) blows up with an "all nodes failed" exception. Strangely, attempts to read from any index which is statically named work fine...?

Has anyone experienced this before and/or have any suggestions as to a resolution? I can't see anything specific to this in the various threads.

Thanks in advance!

Keith.Hinde · May 5, 2016, 11:28am

OK, so after a little more investigation on a test instance, it's definitely the {customer_partition} mapping that is throwing things out. If I replace {customer_partition} with either a specific index name OR a wildcard, the queries work fine (although presumably writing is then broken as there is no way to route data to the correct index).

So, do I need two table definitions - a write (partitioned) and a read (wildcarded) to enable read/write?

costin · May 10, 2016, 9:03am

Dynamic writing works since the target (the index) is determined based on the incoming data. Reading however fails since it's a catch-22; in order to determine the target (the source) one needs access to the data. Which is not known since the target (the source) is not known.
To go around this, try using aliases which allow you to define a well-known, fixed name such as "customer" which can be easily updated to other indices (such as 001, 002, etc...)
In case of reads the alias works since one can read from multiple partitions.
If you want to read from a certain partition, then simply point to it or update the alias accordingly.

Keith.Hinde · May 10, 2016, 9:21am

Thanks Costin - Using aliases was actually going to be my next plan of attack, so thanks for validating the approach. I'll give it a go and report back.

Keith.Hinde · May 17, 2016, 11:37am

OK, progress update:

Added an alias, "all nodes failed" error went away. Yay, progress..of a sort!
Queries taking an age, so added some query DSL to the external table definition, which was an aggregated query...things blew up instantly with an "org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: aggregations are not supported with search_type=scan" exception

Now, I'm seeing mixed signals on whether this is something that can/will be fixed and in what version of ES-Hadoop. Any opinions/inside knowledge out there?

Thanks again!

costin · May 24, 2016, 9:30am

Aggregations are not supported yet but are a high priority item on the roadmap.

Keith.Hinde · May 24, 2016, 9:58am

Thanks Costin - I'll keep my ear to the ground.

Topic		Replies	Views
Select es-hadoop table from hive failed Elasticsearch es-hadoop	8	2262	July 6, 2017
Hive read operation fails when stored as external table pointing to Elastic search location Elasticsearch es-hadoop	4	2071	May 30, 2018
Select external table failed Elasticsearch	5	737	March 24, 2016
From Hive to ES :EsHadoopException: Could not write all entries for bulk operation Elasticsearch	1	1060	March 22, 2019
Dynamic Mapping Hive issue Elasticsearch es-hadoop	2	702	April 24, 2017

Hive select from ES on dynamic index failing: all nodes failed

Related topics