Only one of dozens of log indexes won't rollover: "source alias does not exist"

ES: 6.8.13
NEST 6.8.6

We have a number of indexes for which we have a programmatic process that checks rollover criteria (ie. max age, max size, etc.) every night. They all have essentially identical template rules, etc. Only one fails every night w/ a 400 error that says "source alias does not exist". Using _cat/aliases I can see that's not true. Both our write (ie. "hot") and read aliases exist as expected throughout each cluster. We have the same indexing setup in multiple environments and this same one always fails so I assume it's the configuration but I can't see how it differs materially from any of our other log indexes.

If I run the rollover command in Kibana, I don't get that error. Could this be a NEST library problem?

Hi, @Thomas_Doman,

I think this is most likely something server-side given that it works for all over targets. Have you tried the rollover using Kibana Dev Tools?

If it works in Kibana, can you provide the .NET code you are using? Also, what is the alias name (rollover target) that is failing?

Can you capture the request/response body for the failing request and provide those if possible? You could enable DebugMode in the client and capture the DebugInformation from the response.

Hi @stevejgordon,

Yes, as I mentioned, it works in Kibana. Currently, I'm only capturing OriginalException and ServerError in the logs. I'm going to add DebugInformation to the log message and I'll follow up w/ you when that's deployed in our staging environment.

In the meantime, if this helps, here's the specific error detail I get:

Elasticsearch.Net.ElasticsearchClientException: The remote server returned an error: (400) Bad Request.. Call: Status code 400 from: POST /datadog.metrics.success_write/_rollover?pretty=true&include_type_name=true&dry_run=false. ServerError: Type: illegal_argument_exception Reason: "source alias does not exist" ---> 

@stevejgordon Please find below the Debug Info I was able to extract. Note that the call stack is massive so I only included the very beginning. If the rest of the stack is important, let me know and I think we may want a different place from this convo to embed such a large thing.

Invalid NEST response built from a unsuccessful low level call on POST: /datadog.metrics.success_write/_rollover?pretty=true&error_trace=true&include_type_name=true&dry_run=false
# Audit trail of this API call:
 - [1] BadResponse: Node: https://elastic:redacted@es.XXXXXXXX/ Took: 00:00:00.0312483
# OriginalException: Elasticsearch.Net.ElasticsearchClientException: The remote server returned an error: (400) Bad Request.. Call: Status code 400 from: POST /datadog.metrics.success_write/_rollover?pretty=true&error_trace=true&include_type_name=true&dry_run=false. ServerError: Type: illegal_argument_exception Reason: "source alias does not exist" ---> System.Net.WebException: The remote server returned an error: (400) Bad Request.
   at System.Net.HttpWebRequest.GetResponse()
   at Elasticsearch.Net.HttpWebRequestConnection.Request[TResponse](RequestData requestData) in c:\Projects\elastic
et-6\src\Elasticsearch.Net\Connection\HttpWebRequestConnection.cs:line 57
   --- End of inner exception stack trace ---
# Request:
{
  "conditions": {
    "max_age": "183d",
    "max_size": "40gb"

  }
}
# Response:
{
  "error" : {
    "root_cause" : [
      {
        "type" : "remote_transport_exception",
        "reason" : "[es-XXXXXXX][XXXXXXXX][indices:admin/rollover]",
        "stack_trace" : "[[es-XXXXXXX][XXXXXXXX][indices:admin/rollover]]; nested: RemoteTransportException[[es-XXXXXXXX][XXXXXXXX][indices:admin/rollover]]; nested: IllegalArgumentException[source alias does not exist];
\tat org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:657)
\tat org.elasticsearch.ElasticsearchException.generateFailureXContent(ElasticsearchException.java:585)

Thanks for the extra info @Thomas_Doman.

So from this, we can see that the client is sending the request and the server is receiving it. As far as I can see, the only way to get this exception is when the alias name in the URL path is not found in the cluster state metadata. If this is working from Kibana, with the exact same URL path, and the alias appears in the response from _cat/aliases, I'm at a loss why the result would be different when using the client.

It's unlikely, but could it be possible the alias name on the server includes some hidden whitespace characters? This would be visible in _cat/aliases using Kibana.

What is the output from:

GET _alias/datadog.metrics.success_write

You're testing this in staging. Are you able to reproduce with a minimal client application in development, connecting to the same cluster and issuing the rollover?

At this point, the only thing I can really suggest is capturing the request bytes for the client request to inspect them fully and compare those from the URL sent by the request from Kibana. Either locally or in staging using suitable capture tool.

@stevejgordon Thanks for all your analysis! It turns out that it was a typo on the alias that was hard to see 'sucess' vs. 'success'. ES was telling us exactly what was wrong the whole time!

Thanks again for your help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.