Specify es.query condition in HIVE SQL query?

(Andrew Wooster) #1

Is it possible to pass the es.query parameter as part of a HIVE SQL query?

I currently specify the es.query query string as table property when I define an external table DDL:

    url   STRING,
    content_text ARRAY<STRING>
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
    'es.query' = '?q=desmoplastic',
    'es.port' = '11100', 
    'es.resource' = 'my_index/doc'

What I would really would like to do is be able to specify different es.query criteria for each SQL query, rather than having to create a new external table for each criteria. Something like this:

  select * 
  from doc
  where es.query = '?q=desmoplastic';

(Costin Leau) #2

No, it is not; Hive SQL is not extendable.

Creating other tables for your queries is not that difficult; I would even argue it's the right thing to do (tm) since SQL is all about references and by creating a different external table, you are basically creating a view/query alias.
Which is easily supported and understood.

(Andrew Wooster) #3

Let me describe my use case a little more. I am working on integration between visual analytics tools like Tableau and Spotfire with ES. In a typical scenario, a user would perform a keyword search (with all of the benefits of the es analyzer, such as stemming, etc) to narrow 25M documents down to 100k, and then use Tableau or Spotfire to explore the results via SQL queries. Creating new external tables (i.e. view/query alias) for each query does not seem feasible.

If Hive is not extendable, would Spark be a better option?

(Costin Leau) #4

I'm afraid Spark won't help as well. Both APIs provide SQL interfaces - specifying Lucene or custom queries goes beyond that. I'm not sure whether Tableau or Spotfire allow this either.

Note that one advantage Spark SQL has provides hooks so that ES-Hadoop can push down some operations. Thus one can start with a ES query, apply SQL on top of it and all of them would be assembled in the end.

(system) #5