Spark SQL and many long field names

I have an index with many long field names. When using Spark SQL via Spark Thrift Server, if I create a temporary view and then query it, I get an authorization error, e.g.,

CREATE GLOBAL TEMPORARY VIEW view1 USING org.elasticsearch.spark.sql OPTIONS (resource 'es-index');
SELECT * from view1 limit 1;

failed; server[docker-prod.west.usermind.com:8200] returned [401|Unauthorized:]
at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:488)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:446)
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:436)
at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:363)
at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:92)
at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I observe that the ES query is of the form es-index/_search?sort=_doc&scroll=5m&size=50&_source=field1,field2,field3,...&preference=_shards%3A0%7C_local

Through trial and error, I see that if I remove fields from the end, the query works. This points to a limit in the size of the URL being sent to ES, caused by lots of fields and long field names.

My question: is there a workaround? For example, is there a parameter that can be specified to not use the query string for the underlying query to ES, but to use the query DSL? Or is there a better way to formulate the query to avoid this problem?

Right now there is no option that puts the fields into the DSL body yet. I've seen this ticket float through github recently, I'm assuming it was you who filed it (user names are different, but similar). Definitely an issue and we'll be looking into fixing.

james.baiera https://discuss.elastic.co/users/james.baiera James Baiera
https://discuss.elastic.co/users/james.baiera
March 11

Right now there is no option that puts the fields into the DSL body yet.
I've seen this ticket
https://github.com/elastic/elasticsearch-hadoop/issues/942 float
through github recently, I'm assuming it was you who filed it (user names
are different, but similar). Definitely an issue and we'll be looking into
fixing.

Visit Topic
https://discuss.elastic.co/t/spark-sql-and-many-long-field-names/76458/2
or reply to this email to respond.

To unsubscribe from these emails, click here
https://discuss.elastic.co/email/unsubscribe/6ed9a999ecfedb61d2875b8df2a3c2e2b741abcf16b8cb9ebc7acf76210585ea
.

Yes. I filed it. We have indices in Elasticsearch that have this issue.
Looking forward to the fix

Chris jones

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.