I'm self hosting Elastic connector between Elasticsearch and Azure Blob Storage with Azure Container Instances. While the connection works extremely smoothly with just a few connectors, there seems to be several bugs with when hosting 20 connectors or more in one container.
When the function calls es_client.connector.update_configuration
API call, while the function works fine for the first few connectors, it occasionally throws the following error:
BadRequestError(400, 'status_exception', 'Unknown [configuration] fields in the request payload: [concurrent_downloads, account_name, retry_count, account_key, blob_endpoint, containers]. Remove them from request or register their schema first.'
I think this is possibly due to some throttling or rate limit in the update_configuration
call, is there any documentation or evidence behind this?
Also, when a new connector is added, I update the connector config and restart my container - but the container seems to be really slow at picking up and actually connecting to the newly generated connector to update the configuration. Is there any suggestion to number of connectors that each container should host?
PS: I think it might be helpful to add a connector status attempting_to_connect
to hosted connector, because the initial status seems like needs_configuration
- but configuration doesn't seem like it can be updated before connecting to the hosted container.
Hey @michael-bv can you share what connector and ES version do you use?
He @Jedr_Blaszyk I'm using
Connector: azure_blob_storage
ES version: 9.1.0
(I've been using 9.0.4
but gave an upgrade to see if the upgrade fixed the issue, but the issue persists`
Container version: same as ES version
Deployments location: Both Azure and ES on East US
For further contexts, I did few experimentation:
- Initially synced 2 connectors (success), subsequently synced 2 new connectors (success).
- Initially synced 4 connectors (success), subsequently synced 4 new connectors (really slow at picking up connection with new connectors, but did sync eventually after many retries).
- Initially synced 10 connectors (success), subsequently synced 2 new connectors (failed).
- Initially synced 30 connectors (success), subsequently synced 1 new connector (failed).
This is a snippet of my config.yml
(truncated the ends)
I think my biggest query is, how many connectors should I host in a single container for Azure Blob Storage to Elasticsearch connection?
Hi @michael-bv!
I have found an issue where > 10 connectors can cause orphaned syncs in the UI (Connector UI may list syncs as "orphaned" when > 10 connectors exist · Issue #195127 · elastic/kibana · GitHub). Do you see orphaned syncs in the UI? This bug is written out as a UI bug but wanted to throw it out there in case it seemed similar at all 
Few questions:
- Do you have any other configurations for these connectors like sync rules or schedules?
- Is each azure connector configured with the same azure storage account (account name, account key, blob endpoint)?
- Lastly, after your initial syncs, how long until you synced the new connectors and got failures?
Thanks!
Hi @Meghan_Murphy - yes I did see few orphaned syncs, but I think they are from multiple pending syncs accumulated previously.
- I tried to keep it as basic as possible, just doing manual trigger instead of scheduled sync - I also tried tweaking all variables from
config.yml.example
, but couldn't get it working smoothly with 20+ connectors under one config.yml
file.
- Yes, the connectors all fall under the same storage account.
- I tried few experimentation with this as well, but whether I restarted the container after 5 minutes or an hour - it still couldn't conenct to its 21st connectors after I updated the
config.yml
with new connector.
Hi @michael-bv !
There's no documented limit per say but from what you've shown there's definitely seems to be some sort of resource ceiling with the 21st connector.
I'm wondering if you are hitting some sort of rate limit per storage account. You can lower each connector concurrent_downloads
in your config.yml file (default is 10). Want to try that? Can you also share some more logs?