FSCrawler && SSL && SANs

Hello There!
I am trying to set up a FSCrawler.
This is my _settings.yaml file:

---
name: "test-job"
fs:
  url: "/tmp/es"
  update_rate: "15m"
  excludes:
  - "*/~*"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: false
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
  follow_symlinks: false
elasticsearch:
  nodes:
  - url: "https://{{ip_address}}:{{port_number}}"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"
  ssl_verification: false
  username: "elastic"
  password: "{{pass}}"

My job name is test-job.

I get an error about SANs:

10:18:34,131 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [94.3mb/1.9gb=4.74%], RAM [791.9mb/7.7gb=9.95%], Swap [0b/0b=0.0].
10:18:34,137 INFO  [f.console] No job specified. Here is the list of existing jobs:
10:18:34,140 INFO  [f.console] [1] - test-job
10:18:34,143 INFO  [f.console] Choose your job [1-1]...
1
10:18:40,168 INFO  [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
10:18:40,168 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler started in watch mode. It will run unless you stop it with CTRL+C.
10:18:40,379 WARN  [f.p.e.c.f.c.ElasticsearchClient] We are not doing SSL verification. It's not recommended for production.
10:18:41,482 WARN  [f.p.e.c.f.c.ElasticsearchClient] Failed to create elasticsearch client on Elasticsearch{nodes=[https://{{ip_address}}:{{port_number}}], index='test-job', indexFolder='test-job_folder', bulkSize=100, flushInterval=5s, byteSize=10mb, username='elastic', pipeline='null', pathPrefix='null', sslVerification='false'}. Message: Can not execute GET https://{{ip_address}}:{{port_number}}/ : No subject alternative names present.
10:18:41,483 FATAL [f.p.e.c.f.c.FsCrawlerCli] We can not start Elasticsearch Client. Exiting.
fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClientException: Can not execute GET https://{{ip_address}}:{{port_number}}/ : No subject alternative names present
	at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.httpCall(ElasticsearchClient.java:771) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.httpGet(ElasticsearchClient.java:724) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.getVersion(ElasticsearchClient.java:231) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.start(ElasticsearchClient.java:194) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.service.FsCrawlerManagementServiceElasticsearchImpl.start(FsCrawlerManagementServiceElasticsearchImpl.java:65) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.start(FsCrawlerImpl.java:116) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.startEsClient(FsCrawlerCli.java:322) ~[fscrawler-cli-2.10-SNAPSHOT.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:298) ~[fscrawler-cli-2.10-SNAPSHOT.jar:?]
Caused by: jakarta.ws.rs.ProcessingException: javax.net.ssl.SSLHandshakeException: No subject alternative names present
	at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:270) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:297) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$1(JerseyInvocation.java:675) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.call(JerseyInvocation.java:697) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.lambda$runInScope$3(JerseyInvocation.java:691) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292) ~[jersey-common-3.0.8.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274) ~[jersey-common-3.0.8.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:205) ~[jersey-common-3.0.8.jar:?]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390) ~[jersey-common-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.runInScope(JerseyInvocation.java:691) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:674) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:422) ~[jersey-client-3.0.8.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.httpCall(ElasticsearchClient.java:745) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
	... 7 more
Caused by: javax.net.ssl.SSLHandshakeException: No subject alternative names present
	at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]
	at sun.security.ssl.TransportContext.fatal(TransportContext.java:353) ~[?:?]
	at sun.security.ssl.TransportContext.fatal(TransportContext.java:296) ~[?:?]
	at sun.security.ssl.TransportContext.fatal(TransportContext.java:291) ~[?:?]
	at sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1357) ~[?:?]
	at sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1232) ~[?:?]
	at sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1175) ~[?:?]
	at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392) ~[?:?]
	at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443) ~[?:?]
	at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421) ~[?:?]
	at sun.security.ssl.TransportContext.dispatch(TransportContext.java:183) ~[?:?]
	at sun.security.ssl.SSLTransport.decode(SSLTransport.java:172) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1506) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1416) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:456) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:427) ~[?:?]
	at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:572) ~[?:?]
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:201) ~[?:?]
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592) ~[?:?]
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520) ~[?:?]
	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527) ~[?:?]
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:334) ~[?:?]
	at org.glassfish.jersey.client.internal.HttpUrlConnector._apply(HttpUrlConnector.java:380) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:268) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:297) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$1(JerseyInvocation.java:675) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.call(JerseyInvocation.java:697) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.lambda$runInScope$3(JerseyInvocation.java:691) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292) ~[jersey-common-3.0.8.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274) ~[jersey-common-3.0.8.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:205) ~[jersey-common-3.0.8.jar:?]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390) ~[jersey-common-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.runInScope(JerseyInvocation.java:691) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:674) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:422) ~[jersey-client-3.0.8.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.httpCall(ElasticsearchClient.java:745) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
	... 7 more
Caused by: java.security.cert.CertificateException: No subject alternative names present
	at sun.security.util.HostnameChecker.matchIP(HostnameChecker.java:142) ~[?:?]
	at sun.security.util.HostnameChecker.match(HostnameChecker.java:101) ~[?:?]
	at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:455) ~[?:?]
	at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:429) ~[?:?]
	at sun.security.ssl.AbstractTrustManagerWrapper.checkAdditionalTrust(SSLContextImpl.java:1544) ~[?:?]
	at sun.security.ssl.AbstractTrustManagerWrapper.checkServerTrusted(SSLContextImpl.java:1511) ~[?:?]
	at sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1341) ~[?:?]
	at sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1232) ~[?:?]
	at sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1175) ~[?:?]
	at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:392) ~[?:?]
	at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:443) ~[?:?]
	at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:421) ~[?:?]
	at sun.security.ssl.TransportContext.dispatch(TransportContext.java:183) ~[?:?]
	at sun.security.ssl.SSLTransport.decode(SSLTransport.java:172) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1506) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1416) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:456) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:427) ~[?:?]
	at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:572) ~[?:?]
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:201) ~[?:?]
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592) ~[?:?]
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520) ~[?:?]
	at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527) ~[?:?]
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:334) ~[?:?]
	at org.glassfish.jersey.client.internal.HttpUrlConnector._apply(HttpUrlConnector.java:380) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:268) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:297) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$1(JerseyInvocation.java:675) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.call(JerseyInvocation.java:697) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.lambda$runInScope$3(JerseyInvocation.java:691) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292) ~[jersey-common-3.0.8.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274) ~[jersey-common-3.0.8.jar:?]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:205) ~[jersey-common-3.0.8.jar:?]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390) ~[jersey-common-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.runInScope(JerseyInvocation.java:691) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:674) ~[jersey-client-3.0.8.jar:?]
	at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:422) ~[jersey-client-3.0.8.jar:?]
	at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.httpCall(ElasticsearchClient.java:745) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
	... 7 more
10:18:41,509 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [test-job] stopped
10:18:41,511 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [test-job] stopped

Could you share what you have exactly for:

elasticsearch:
  nodes:
  - url: "https://{{ip_address}}:{{port_number}}"

?

Those are my ES ip and port that are set in elasticsearch.yml. All other modules of this cluster like filebeat or logstash work fine with those settings.

Could you try with the host name instead?

I have tried with hostname/fqdn (same) and got an error connection refused but I think this is because of nginx reverse proxy. In my ufw rules I have allowed in/out traffic at lo.

So you have a proxy? That's something interesting.
I believe that's where the problem is.

The SSL certificate might not be valid in that case.

I'm not an expert on that field so I'm unsure I could help.

If possible to test this without the proxy that'd be great to confirm where you should look at...

But proxy is only for Kibana and its http traffic.

You mentioned the proxy, right?
Is it possible to run this from the machine running FSCrawler?

curl https://{{ip_address}}:{{port_number}}/ -u elastic:{{pass}}

Reverse proxy by nginx and as I wrote: filebeat or metricbeat work perfectly fine with config like

{{ip_address}}:{{port}}

As I use self-signed certificates I had to use -k and the result is here:

{
  "name" : "node-1",
  "cluster_name" : "ks-es",
  "cluster_uuid" : "V88w5UHeSo-_KoCGSBxh3w",
  "version" : {
    "number" : "7.17.6",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "f65e9d338dc1d07b642e14a27f338990148ee5b6",
    "build_date" : "2022-08-23T11:08:48.893373482Z",
    "build_snapshot" : false,
    "lucene_version" : "8.11.1",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

I think the FSCrawler equivalent option is:

ssl_verification: false

Not sure why this does not work in your case.

Funny thing.. in my FSCrawler log I can see:

WARN  [f.p.e.c.f.c.ElasticsearchClient] Failed to create elasticsearch client on Elasticsearch{nodes=[https://195.187.35.196:10000], index='test-job', indexFolder='test-job_folder', bulkSize=100, flushInterval=5s, byteSize=10mb, username='elastic', pipeline='null', pathPrefix='null', sslVerification='false'}

so FSCrawler knows not to check it. I have download *.zip file, unziped... maybe I should change an installation method? What can I do to install FSCrawler on Debian 11?

Dear @dadoonet I think the solution is to download 'standard' version of FSCrawler, not the snap version as I thought :stuck_out_tongue:

After downloading the 'standard' version (I just saw it at downloads page [am I blind??]) and setting config again i got in my logs:

13:43:32,150 INFO  [f.p.e.c.f.FsParserAbstract] FS crawler started for [test-job2] for [/mnt/fscdata/] every [1m]

My config for test-job2 is:

---
name: "test-job2"
fs:
  url: "/mnt/fscdata/"
  update_rate: "1m"
  excludes:
  - "*/~*"
  json_support: true
  filename_as_id: true
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: true
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
  follow_symlinks: false
elasticsearch:
  nodes:
  - url: "https://{{ip_address}}:{{port_number}}"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"
  ssl_verification: false
  username: "elastic"
  password: "{{pass}}"
  ssl.certificate_authorities: "/etc/filebeat/elastic-stack-ca.pem"
  ssl.certificate: "/etc/kibana/es.crt.pem"
  ssl.key: "/etc/kibana/es.key.pem"

I will test it because I can see few WARNs in log like this one:

13:47:32,474 WARN  [o.e.c.RestClient] request [POST https://{{ip_address}}:{{port_number}}/test-job2_folder/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true&ignore_throttled=false&search_type=query_then_fetch&batched_reduce_size=512] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]

Are you saying that it works with the latest stable version 2.9 but not with the most recent 2.10-SNAPSHOT?

Yes, it looks like. And as I mentioned: I have few WARNs but will test it and try to find a solution.
Specs of my nodes (VMs) are:
Debian 11
4xCPU
8 GB RAM

Interesting. Which exact SNAPSHOT version did you download? The download page sorts by default in the "wrong" order. Did you get the latest SNAPSHOT? That should be the right one to use.

If you did, could you open an issue in FSCrawler project?

The SNAPSHOT version I downloaded is: fscrawler-distribution-2.10-SNAPSHOT.

Which exact URL please? Where did you get it from?

Link directly to file: fscrawler-distribution-2.10-20220907.182632-105.zip and the path to network resource is here.

Path to network resource is linked from here:
https://fscrawler.readthedocs.io/en/latest/installation.html

Would it be possible to run FSCrawler directly against the elasticsearch instance?

Could you DM me the exact configuration without redacting any value?

Thanks

OK, I will do it today.
But still it's not 100% perfect as I got warning about frozen index and as I assume it's not scanning my directory for files to ingest.
My warning is:

09:22:21,551 WARN  [o.e.c.RestClient] request [POST https://{{ip_address}}:{{port_number}}/test-job2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true&ignore_throttled=false&search_type=query_then_fetch&batched_reduce_size=512] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
09:22:21,557 WARN  [o.e.c.RestClient] request [POST https://{{ip_address}}:{{port_number}}/test-job2_folder/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true&ignore_throttled=false&search_type=query_then_fetch&batched_reduce_size=512] returned 1 warnings: [299 Elasticsearch-7.17.6-f65e9d338dc1d07b642e14a27f338990148ee5b6 "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]

My index is not frozen, I have cleared it and it's still not working.