Hive select from ES on dynamic index failing: all nodes failed


(Keith Hinde) #1

Hive: v1.2.1.2.3.2.0-2950
ES-Hadoop: velasticsearch-hadoop-2.3.0.jar
ES: v2.1.1

So, we have a three node cluster with a dynamically named index - e.g. 001, 002, 003 - each of which holds data related to a subset of customers. The create external table statement includes the following property:

TBLPROPERTIES('es.resource' = '{customer_partition}/customer_mapping');

Now, writing to this external table (INSERT/UPDATE) from Hive works well, but any attempt to read (SELECT) blows up with an "all nodes failed" exception. Strangely, attempts to read from any index which is statically named work fine...?

Has anyone experienced this before and/or have any suggestions as to a resolution? I can't see anything specific to this in the various threads.

Thanks in advance!


(Keith Hinde) #2

OK, so after a little more investigation on a test instance, it's definitely the {customer_partition} mapping that is throwing things out. If I replace {customer_partition} with either a specific index name OR a wildcard, the queries work fine (although presumably writing is then broken as there is no way to route data to the correct index).

So, do I need two table definitions - a write (partitioned) and a read (wildcarded) to enable read/write?


(Costin Leau) #3

Dynamic writing works since the target (the index) is determined based on the incoming data. Reading however fails since it's a catch-22; in order to determine the target (the source) one needs access to the data. Which is not known since the target (the source) is not known.
To go around this, try using aliases which allow you to define a well-known, fixed name such as "customer" which can be easily updated to other indices (such as 001, 002, etc...)
In case of reads the alias works since one can read from multiple partitions.
If you want to read from a certain partition, then simply point to it or update the alias accordingly.


(Keith Hinde) #4

Thanks Costin - Using aliases was actually going to be my next plan of attack, so thanks for validating the approach. I'll give it a go and report back.


(Keith Hinde) #5

OK, progress update:

  • Added an alias, "all nodes failed" error went away. Yay, progress..of a sort!
  • Queries taking an age, so added some query DSL to the external table definition, which was an aggregated query...things blew up instantly with an "org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: aggregations are not supported with search_type=scan" exception

Now, I'm seeing mixed signals on whether this is something that can/will be fixed and in what version of ES-Hadoop. Any opinions/inside knowledge out there?

Thanks again!


(Costin Leau) #6

Aggregations are not supported yet but are a high priority item on the roadmap.


(Keith Hinde) #7

Thanks Costin - I'll keep my ear to the ground.


(system) #8