Using alias from spark SQL


Hello, I'm using spark SQL in order to extract data from elastic into csv files.

used software versions :

  • elastic 5.5.2
  • spark SQL 2.1.1
  • elasticsearch-spark-20_2.11 5.5.2
  • scala 2.11.8

I query elastic using alias.
When the alias is updated (remove and add new indices) during spark job processing, this last one fails.

Does it possible to preserve the same index from the beginning to end of the spark job execution (atomic process) ?

My workaround is searching the indices associated to the alias at the beginning of the spark job and initialize the dataframe with these indices.
Does it exist a better solution ?

Thanks !

(James Baiera) #2

At the start of the job the connector should discover the active search shards for an alias, each of which should contain the underlying index name that they belong to. After this index name is pulled, we apply any alias metadata to the scroll request for that shard when the read task is started. While the scroll request should continue to operate against the index name only, the rest of the job will continue to make calls to Elasticsearch via the provided alias.

This sort of situation is tough to manage since aliases provide quite a bit of functionality that would otherwise be transparent to the client, so we try and take advantage of the given alias as much as possible. That said, if you can provide some steps for reproducing this issue, we can look into improving how the connector handles aliases.



my steps are

  • t0 : first call to elastic via alias (here index-0 is used)
  • t1 : alias update from index-0 to index-1
  • t2 : second call to elastic via alias (here index-1 is used) ==> it failed

In my case, data in elastic is schemaless. So, in case an attribute is not present after the alias updated then the second query to elastic failed.

My workaround has been successfully validated. In that case the steps become :

  • init: get the indices associated to the alias (get index-0)
  • t0 : first call to elastic via index-0
  • t1 : alias update from index-0 to index-1
  • t2 : second call to elastic via index-0 ==> it succeeded


(James Baiera) #4

Could you open an issue on Github with this reproduction information? Thanks!

(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.