So, we have a three node cluster with a dynamically named index - e.g. 001, 002, 003 - each of which holds data related to a subset of customers. The create external table statement includes the following property:
Now, writing to this external table (INSERT/UPDATE) from Hive works well, but any attempt to read (SELECT) blows up with an "all nodes failed" exception. Strangely, attempts to read from any index which is statically named work fine...?
Has anyone experienced this before and/or have any suggestions as to a resolution? I can't see anything specific to this in the various threads.
OK, so after a little more investigation on a test instance, it's definitely the {customer_partition} mapping that is throwing things out. If I replace {customer_partition} with either a specific index name OR a wildcard, the queries work fine (although presumably writing is then broken as there is no way to route data to the correct index).
So, do I need two table definitions - a write (partitioned) and a read (wildcarded) to enable read/write?
Dynamic writing works since the target (the index) is determined based on the incoming data. Reading however fails since it's a catch-22; in order to determine the target (the source) one needs access to the data. Which is not known since the target (the source) is not known.
To go around this, try using aliases which allow you to define a well-known, fixed name such as "customer" which can be easily updated to other indices (such as 001, 002, etc...)
In case of reads the alias works since one can read from multiple partitions.
If you want to read from a certain partition, then simply point to it or update the alias accordingly.
Thanks Costin - Using aliases was actually going to be my next plan of attack, so thanks for validating the approach. I'll give it a go and report back.
Added an alias, "all nodes failed" error went away. Yay, progress..of a sort!
Queries taking an age, so added some query DSL to the external table definition, which was an aggregated query...things blew up instantly with an "org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: aggregations are not supported with search_type=scan" exception
Now, I'm seeing mixed signals on whether this is something that can/will be fixed and in what version of ES-Hadoop. Any opinions/inside knowledge out there?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.