Using alias from spark SQL

frederic.boissiere · January 5, 2018, 9:05am

Hello, I'm using spark SQL in order to extract data from elastic into csv files.

used software versions :

elastic 5.5.2
spark SQL 2.1.1
elasticsearch-spark-20_2.11 5.5.2
scala 2.11.8

I query elastic using alias.
When the alias is updated (remove and add new indices) during spark job processing, this last one fails.

Does it possible to preserve the same index from the beginning to end of the spark job execution (atomic process) ?

My workaround is searching the indices associated to the alias at the beginning of the spark job and initialize the dataframe with these indices.
Does it exist a better solution ?

Thanks !
Regards

james.baiera · January 18, 2018, 7:15pm

At the start of the job the connector should discover the active search shards for an alias, each of which should contain the underlying index name that they belong to. After this index name is pulled, we apply any alias metadata to the scroll request for that shard when the read task is started. While the scroll request should continue to operate against the index name only, the rest of the job will continue to make calls to Elasticsearch via the provided alias.

This sort of situation is tough to manage since aliases provide quite a bit of functionality that would otherwise be transparent to the client, so we try and take advantage of the given alias as much as possible. That said, if you can provide some steps for reproducing this issue, we can look into improving how the connector handles aliases.

frederic.boissiere · January 19, 2018, 8:06am

Hello,

my steps are

t0 : first call to elastic via alias (here index-0 is used)
t1 : alias update from index-0 to index-1
t2 : second call to elastic via alias (here index-1 is used) ==> it failed

In my case, data in elastic is schemaless. So, in case an attribute is not present after the alias updated then the second query to elastic failed.

My workaround has been successfully validated. In that case the steps become :

init: get the indices associated to the alias (get index-0)
t0 : first call to elastic via index-0
t1 : alias update from index-0 to index-1
t2 : second call to elastic via index-0 ==> it succeeded

Regards

james.baiera · January 29, 2018, 4:41am

Could you open an issue on Github with this reproduction information? Thanks!

system · February 26, 2018, 4:41am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reading from specific index under alias Elasticsearch es-hadoop	1	452	March 22, 2021
Spark writes documents in non-suitable indices Elasticsearch es-hadoop	5	485	June 17, 2020
Elasticsearch 6.8 - Indexation with alias Elasticsearch	1	416	January 16, 2020
ELSER2 \| Ingest - Cannot switch alias for sparse_vector Elasticsearch elastic-stack-machine-learning , language-clients	3	254	January 6, 2024
Super confused about multiple indices with a single alias Elasticsearch	4	3589	July 5, 2017

Using alias from spark SQL

Related topics