ODBC + Hive + Elasticsearch doesn't seem to work

I'm running Elasticsearch 1.7.2 and Hortonworks HDP 2.3. I have an external table in Hive that points to an index in Elasticsearch. Within Hive, I can run counts and queries without any problems.

However, I'm not able to use MS Access to create a linked table via ODBC to the Hive/Elasticsearch table. This works perfectly fine for other Hive tables (HDFS external, ORC, etc). It only seems to be an issue with Elasticsearch based external tables. Is this a known issue? Is this there a work around?

We have analysts that use MS Access to run queries against multiple databases and we would love for them to use Elasticsearch as a source. There doesn't appear to be a means of connecting to ES via ODBC, so I'm trying to use Hive as a middle man. But, using MS Access directly against the existing data in a Hive ORC table is /slow/!

I'm open to ideas.

Are you able to use a different application, such as Excel MSQuery, to query the ES tables via Hive? And what do you mean when you say you can't query them, what is the issue?

We do not currently have the ability to use Excel MSQuery. I believe that requires Office 2010 or newer and we are limited to 2007 at the moment.

When I use Access to link to an external ODBC table, I can the list of my Hive tables. One of those tables is the Elasticsearch table. when I click on that table Access throws:

The Microsoft Access database engine could not find the object ''. Make sure the object exists and that you spelled is name and the path name correctly.

This table does exist and it works from within Hive itself.

I would think that perhaps the table is simply in another database or schema. You can use ODBCTest to see all available tables:

  1. First go to https://www.microsoft.com/en-ca/download/details.aspx?id=21995S downloading
    Microsoft Data Access Components (MDAC) 2.8 SDK and install it

  2. Go to start -> All programs -> Microsoft Data Access SDK 2.8 and choose the right version based on your operating system with Unicode encoding.

  3. In ODBCTest, go to Conn -> Full Connect, select your Data Source, leave ODBC Behavior and Cursor Library to Default, and enter your UserID and Password (if you have one) then click on OK (Please make sure you have configured your DSN in ODBC Administrator first)

  4. If the connection is built successfully, you will see a new window pops up with the info "Successfully connected to DSN 'Your DSN name'."

    4.1 Then you can go to Catalog -> SQLTables to fetch the table info, if you see "Return:SQL_SUCCESS=0", then you can try Ctrl + R to get the result.

    4.2 You can also try to write some simple SQL queries to test the connection (use Ctrl + E to execute and Ctrl + R to get the return data)

Thank you for the response.

It is in the default schema. I'm very comfortable with Hive, so I know
where the table is.

When I use Access to link to an external ODBC table (something I've done
quite a bit with Oracle), the elasticsearch hive table shows up in the list
along with all the other Hive tables. The problem is that when I click on
the table in question, normally you are presented with a dialog which asks
you to select the primary key. That process works for all other external
tables I'm currently using. When I try to do that with the Elasticsearch
table, I get the error I reported.

Unfortunately, I'm not able to install any software on the environment
where I'm experiencing the problem. So my options are somewhat limited in
that respect.

Interesting. Would you be able to enable driver logging in the driver (the install guide tells you how) and post a link to the log? This should give more details as to exactly what's going on. Additionally, if you could get a driver manager trace, that would also be helpful (see http://www.simba.com/blog/odbc-troubleshooting-tracing/)

Unfortunately, I'm unable to post my logs because of my environment. I have to manually copy relevant messages, which is painful.

I did enable ODBC trace logging and the relevant log line seems to be

EXIT SQLFetch with return code 100 (SQL_NO_DATA_FOUND)

As I said, the table works fine. I can query it within Hive without issue and I've confirmed the name of the table (default.mytablename) within Hive as well as within Access. Access is clearly able to get a list of tables from Hive when I make the initial connection. It just can't seem to access anything specific to the Hive+Elasticsearch tables.

That error code is expected, it simply means you've finished retrieving the results.

I have a suspicion that the Access error may be related to primary keys or statistics reported by that table, but cannot verify it as without logs or a separate application to troubleshoot. Can you access the Hive environment from a separate machine that you do control?

I resolved the issue. I enabled logging on the ODBC driver itself and noticed the driver showing the loading of jars from the classpath. It occurred to me that I'm currently using an "ADD JAR" to my hive scripts to allow me to query Hive+Elasticsearch tables.

I made sure the elasticsearch-hadoop jar was in the Hive classpath and restarted Hive. Now I can access the Hive+Elasticsearch tables without any issues.

Thank you for the debugging help!

Glad I was able to sort of help point the direction :slight_smile:

Indeed you did! Thanks again.

Thanks for helping out @kyleporter - glad to see this issue sorted out.

I've added a note in the hive docs to indicate to folks to put ES-Hadoop in the Hive path if they are using ODBC.

1 Like