Spark SQL and many long field names

cj_usermind · February 24, 2017, 11:07pm

I have an index with many long field names. When using Spark SQL via Spark Thrift Server, if I create a temporary view and then query it, I get an authorization error, e.g.,

CREATE GLOBAL TEMPORARY VIEW view1 USING org.elasticsearch.spark.sql OPTIONS (resource 'es-index');
SELECT * from view1 limit 1;

failed; server[docker-prod.west.usermind.com:8200] returned [401|Unauthorized:]
at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:488)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:446)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:436)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:363)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:92)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I observe that the ES query is of the form es-index/_search?sort=_doc&scroll=5m&size=50&_source=field1,field2,field3,...&preference=_shards%3A0%7C_local

Through trial and error, I see that if I remove fields from the end, the query works. This points to a limit in the size of the URL being sent to ES, caused by lots of fields and long field names.

My question: is there a workaround? For example, is there a parameter that can be specified to not use the query string for the underlying query to ES, but to use the query DSL? Or is there a better way to formulate the query to avoid this problem?

james.baiera · March 11, 2017, 8:00pm

Right now there is no option that puts the fields into the DSL body yet. I've seen this ticket float through github recently, I'm assuming it was you who filed it (user names are different, but similar). Definitely an issue and we'll be looking into fixing.

cj_usermind · March 11, 2017, 9:43pm

james.baiera https://discuss.elastic.co/users/james.baiera James Baiera
https://discuss.elastic.co/users/james.baiera
March 11

Right now there is no option that puts the fields into the DSL body yet.
I've seen this ticket
https://github.com/elastic/elasticsearch-hadoop/issues/942 float
through github recently, I'm assuming it was you who filed it (user names
are different, but similar). Definitely an issue and we'll be looking into
fixing.

Visit Topic
https://discuss.elastic.co/t/spark-sql-and-many-long-field-names/76458/2
or reply to this email to respond.

To unsubscribe from these emails, click here
https://discuss.elastic.co/email/unsubscribe/6ed9a999ecfedb61d2875b8df2a3c2e2b741abcf16b8cb9ebc7acf76210585ea
.

Yes. I filed it. We have indices in Elasticsearch that have this issue.
Looking forward to the fix

Chris jones

system · April 8, 2017, 9:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Spark SQL thriftserver DDL? Elasticsearch es-hadoop	3	2032	July 6, 2017
Error load as a DataFrame Elasticsearch es-hadoop	6	1754	July 6, 2017
Java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark Elasticsearch	8	1065	July 6, 2017
Spark with elasticsearch hadoop Elasticsearch es-hadoop	2	717	July 4, 2017
Elasticsearch-spark - EsHadoopIllegalStateException - field position Elasticsearch es-hadoop	4	1923	May 3, 2019

Spark SQL and many long field names

Related topics