Can Fscrawler access Elasticsearch cluster behind load balancer

We have Elasticsearch cluster setup on Kubernetes in cloud environment. We also have kubernetes ingress controller setup to access the cluster using the loadbalancer external IP.
e.g. http://LB_IP_ADDRESS//ES_SERVICE

I have setup Fscrawler on another compute node with the elasticsearch node URL set to http://LB_IP_ADDRESS//ES_SERVICE. But it throws the following error.

java.io.IOException: LB_IP_ADDRESS//ES_SERVICE: Name or service not known
        at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:793) ~[elasticsearch-rest-client-7.3.0.jar:7.3.0]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:218) ~[elasticsearch-rest-client-7.3.0.jar:7.3.0]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:205) ~[elasticsearch-rest-client-7.3.0.jar:7.3.0]
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1454) ~[elasticsearch-rest-high-level-client-7.3.0.jar:7.3.0]
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1439) ~[elasticsearch-rest-high-level-client-7.3.0.jar:7.3.0]
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1406) ~[elasticsearch-rest-high-level-client-7.3.0.jar:7.3.0]
        at org.elasticsearch.client.RestHighLevelClient.info(RestHighLevelClient.java:702) ~[elasticsearch-rest-high-level-client-7.3.0.jar:7.3.0]
        at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.getVersion(ElasticsearchClientV7.java:169) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.checkVersion(ElasticsearchClient.java:199) ~[fscrawler-elasticsearch-client-base-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.start(ElasticsearchClientV7.java:142) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:263) [fscrawler-cli-2.7-SNAPSHOT.jar:?]
Caused by: java.net.UnknownHostException: IP_ADDRESS/ES_CLUSTER: Name or service not known
        at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[?:1.8.0_221]
        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) ~[?:1.8.0_221]
        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) ~[?:1.8.0_221]
        at java.net.InetAddress.getAllByName0(InetAddress.java:1277) ~[?:1.8.0_221]
        at java.net.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_221]
        at java.net.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_221]
        at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45) ~[httpclient-4.5.9.jar:4.5.9]
        at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager$InternalAddressResolver.resolveRemoteAddress(PoolingNHttpClientConnectionManager.java:664) ~[httpasyncclient-4.1.4.jar:4.1.4]
        at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager$InternalAddressResolver.resolveRemoteAddress(PoolingNHttpClientConnectionManager.java:635) ~[httpasyncclient-4.1.4.jar:4.1.4]
        at org.apache.http.nio.pool.AbstractNIOConnPool.processPendingRequest(AbstractNIOConnPool.java:474) ~[httpcore-nio-4.4.11.jar:4.4.11]
        at org.apache.http.nio.pool.AbstractNIOConnPool.lease(AbstractNIOConnPool.java:280) ~[httpcore-nio-4.4.11.jar:4.4.11]
        at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.requestConnection(PoolingNHttpClientConnectionManager.java:295) ~[httpasyncclient-4.1.4.jar:4.1.4]
        at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.requestConnection(AbstractClientExchangeHandler.java:377) ~[httpasyncclient-4.1.4.jar:4.1.4]
        at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.start(DefaultClientExchangeHandlerImpl.java:129) ~[httpasyncclient-4.1.4.jar:4.1.4]
        at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:141) ~[httpasyncclient-4.1.4.jar:4.1.4]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:214) ~[elasticsearch-rest-client-7.3.0.jar:7.3.0]

Is there anything I am missing here? Or, Fscrawler only support the URL format http://cluster_ip:9200?

Interesting. FSCrawler does not support a prefix path.
I don't know yet if it's doable but it's worth opening an issue in FSCrawler project.
With also a way to reproduce the environment....

Thanks

Thanks for quick response and clarifying. Let me create an issue for further review.

Regards,
Rajesh

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.