Remote reindex No route to host error despite remote cluster returning _cluster/health info

Hi All!

We use Elasticsearch 6.8.0 as a search engine for our application logs.
Unfortunately our cluster crashed and some indices were unrecoverable.
Since the cluster was in red state we spun up a new cluster so that logs could start again.

We've setup the new cluster with the reindex.remote.whitelist: setting.
Initially we were able to migrate around 4-5k indices from the Old cluster.

However we need to migrate a few more but we've started getting the error -
No route to host

Full error -

{
  "error": {
    "root_cause": [
      {
        "type": "no_route_to_host_exception",
        "reason": "No route to host"
      }
    ],
    "type": "no_route_to_host_exception",
    "reason": "No route to host"
  },
  "status": 500
}

Reindex request -

POST _reindex?wait_for_completion=true&refresh
{
  "source": {
    "remote": {
      "host": "http://old-cluster:80",
      "socket_timeout": "1m",
      "connect_timeout": "1m"
    },
    "index": "index"
  },
  "dest": {
    "index": "index"
  }
}

These are all the logs I could find -

[2024-08-01T05:40:18,597][WARN ][r.suppressed             ] [titanNew-01] path: /_reindex, params: {wait_for_completion=true}
java.net.NoRouteToHostException: No route to host
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) ~[?:?]
        at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:171) ~[?:?]
        at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:145) ~[?:?]
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) ~[?:?]
        at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) ~[?:?]
        at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) ~[?:?]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_412]

Someone please help with this.

I'm pretty sure that No route to host is an error message coming directly from the OS, you likely need to involve your local network experts. Can't be certain without looking at the code tho, and 6.8.0 is far too old for that. Does it reproduce on a version that isn't EOL?

Hi David,

The weird thing is that if I curl the old cluster it's successful.
This is from a VM in the new Cluster -

root@ip-192-168-31-250:~# curl titan-old.search.com/_cluster/health?pretty
{
  "cluster_name" : "Titan",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 7,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 8474,
  "active_shards" : 8926,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 3879,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 69.70714564623194
}

Unfortunately we can't test this with a newer version since 7.x.x breaks a lot of things for us.

Let me know if you'd like me to test anything else.

Thanks!

I have no other suggestions, sorry.

@DavidTurner thanks!

Anyone else who you could refer to help with this?

No I don't think anyone else would look into issues with such an old version either.

1 Like

:scream:

Thanks anyways @DavidTurner !

For anyone else struggling with something similar a decent workaround is -
elasticdump

It's a NodeJS based tool that migrates data and mappings as needed.

Not sure if this is approved by the Elastic Stack team though