Connecting FSCrawler to Workplace Enterprise Search

Hi,

I'm currently using FSCrawler connecting to Workplace Enterprise Search. The FSCrawler was working and connecting to my personal deployment's Enterprise Search. However, when I tried with my company deployment, I received an error saying that I could not connect to Elastic Search and Workplace Search. The error is 401 Unauthorised on creating Elasticsearch Client (note that it works with my personal deployment and also when not connecting to Workplace Search). My code is as below.

---
name: "project"
fs:
  url: "\\.fscrawler\\project\\files"
  update_rate: "1m"
  excludes:
  - "*/~*"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: false
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
  follow_symlinks: false
elasticsearch:
  username: "elastic"
  password: "****"
  nodes:
  - cloud_id: "****"
workplace_search:
  access_token: "****"
  id: "****"
  server: "****"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"
  ssl_verification: true
rest:
  url: "http://127.0.0.1:8080/fscrawler"

The **** are sensitive information. Please let me know what information do I need to include within the code and what settings or information that are needed to provide on Workplace Search for an account to be able to connect to the server.

Are you running the service on cloud.elastic.co?

What are the full logs?

Could you start FSCrawler with --debug so we can have more info?

My personal account or deployment is running on cloud.elastic.co and I'm trying to connect with my company's Elastic Enterprise Search.
I have just reset password for my deployment which help me connecting to my Elastic Search and it connected to a node. However, it failed to create Workplace Search client. I'm not quite sure if the version is a thing because my company's Elastic Enterprise Search version is 7.17 while my deployment version is 8.5.1.
I have run debug and this is what I get. The **** are the server but I hide it because it is sensitive information.

10:09:19,492 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [239.6mb/3.9gb=5.91%], RAM [8.1gb/15.8gb=51.16%], Swap [14.2gb/27.7gb=51.54%].
10:09:19,493 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
10:09:19,494 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
10:09:19,494 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings.json] already exists
10:09:19,495 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings_folder.json] already exists
10:09:19,495 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_wpsearch_settings.json] already exists
10:09:19,499 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [8/_settings.json] already exists
10:09:19,499 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [8/_settings_folder.json] already exists
10:09:19,499 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [8/_wpsearch_settings.json] already exists
10:09:19,500 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Cleaning existing status for job [project]...
10:09:19,501 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [project]...
10:09:19,728 WARN  [f.p.e.c.f.c.FsCrawlerCli] Workplace Search integration does not support removing existing documents. but fs.remove_deleted is set to true. We are forcing it to false.
10:09:19,729 INFO  [f.p.e.c.f.c.FsCrawlerCli] Workplace Search integration is an experimental feature. As is it is not fully implemented and settings might change in the future.
10:09:19,746 INFO  [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
10:09:19,782 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version
10:09:20,630 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version returns 8.5.1 and 8 as the major version number
10:09:20,631 INFO  [f.p.e.c.f.c.ElasticsearchClient] Elasticsearch Client connected to a node running version 8.5.1
10:09:20,635 DEBUG [f.p.e.c.f.s.FsCrawlerManagementServiceElasticsearchImpl] Elasticsearch Management Service started
10:09:20,635 DEBUG [f.p.e.c.f.c.WorkplaceSearchClient] Starting Workplace Search client
10:09:20,636 DEBUG [f.p.e.c.f.t.w.WPSearchClient] Starting the WPSearchClient: WPSearchClient{, host='****'}
10:09:20,641 DEBUG [f.p.e.c.f.t.w.WPSearchClient] get version
10:09:20,641 DEBUG [f.p.e.c.f.t.w.WPSearchClient] Calling GET /api/ent/v1/internal/version
10:09:21,739 WARN  [f.p.e.c.f.t.w.WPSearchClient] failed to create workplace search client on ****, disabling crawler...
10:09:21,739 FATAL [f.p.e.c.f.c.FsCrawlerCli] We can not start Elasticsearch Client. Exiting.
com.jayway.jsonpath.PathNotFoundException: No results for path: $['number']
10:09:21,746 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [project]
10:09:21,746 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing Elasticsearch client manager
10:09:21,747 DEBUG [f.p.e.c.f.f.b.FsCrawlerBulkProcessor] Closing BulkProcessor
10:09:21,747 DEBUG [f.p.e.c.f.f.b.FsCrawlerBulkProcessor] BulkProcessor is now closed
10:09:21,750 DEBUG [f.p.e.c.f.s.FsCrawlerManagementServiceElasticsearchImpl] Elasticsearch Management Service stopped
10:09:21,751 DEBUG [f.p.e.c.f.c.WorkplaceSearchClient] Closing Workplace Search client
10:09:21,751 DEBUG [f.p.e.c.f.t.w.WPSearchClient] Closing the WPSearchClient
10:09:21,751 DEBUG [f.p.e.c.f.f.b.FsCrawlerBulkProcessor] Closing BulkProcessor
10:09:21,751 DEBUG [f.p.e.c.f.f.b.FsCrawlerBulkProcessor] BulkProcessor is now closed
10:09:21,752 DEBUG [f.p.e.c.f.c.WorkplaceSearchClient] Workplace Search client closed
10:09:21,753 DEBUG [f.p.e.c.f.s.FsCrawlerDocumentServiceWorkplaceSearchImpl] Workplace Search Document Service stopped
10:09:21,753 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
10:09:21,753 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [project] stopped
10:09:21,754 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [project]
10:09:21,754 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing Elasticsearch client manager
10:09:21,754 DEBUG [f.p.e.c.f.s.FsCrawlerManagementServiceElasticsearchImpl] Elasticsearch Management Service stopped
10:09:21,755 DEBUG [f.p.e.c.f.c.WorkplaceSearchClient] Closing Workplace Search client

You seem to be using a cloud id in the settings. Is that intended?

Is the workplace search server running in your company and available from where fscrawler is running?

Hi, sorry for my late reply.

I intended to use the cloud id since when I take the cloud id out and run FSCrawler, it failed to connect to the Elastic Search server or node since it was trying to connect to a local node which looks like below.

Also, I noticed that the FSCrawler was trying to get the version of Workplace Search (On my 28th November reply) by calling "GET /api/ent/v1/internal/version" and received error since it was not able to get the version.

I have tried this API with Insomnia and received error 401: Unauthorized. I think the settings needed some more information to get authorized to Workplace Search or configure an account that is authorized to access to my company Workplace Search since my company Workplace Search is private.

09:47:08,468 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [240.5mb/3.9gb=5.94%], RAM [1.9gb/15.7gb=12.3%], Swap [12.6gb/23.2gb=54.2%].
09:47:08,471 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
09:47:08,472 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
09:47:08,472 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings.json] already exists
09:47:08,472 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings_folder.json] already exists
09:47:08,473 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_wpsearch_settings.json] already exists
09:47:08,474 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [8/_settings.json] already exists
09:47:08,474 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [8/_settings_folder.json] already exists
09:47:08,475 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [8/_wpsearch_settings.json] already exists
09:47:08,476 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Cleaning existing status for job [engineeringtest]...
09:47:08,476 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [engineeringtest]...
09:47:08,868 INFO  [f.p.e.c.f.c.FsCrawlerCli] Workplace Search integration is an experimental feature. As is it is not fully implemented and settings might change in the future.
09:47:08,896 INFO  [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
09:47:08,957 DEBUG [f.p.e.c.f.c.ElasticsearchClient] get version
09:47:09,472 WARN  [f.p.e.c.f.c.ElasticsearchClient] Failed to create elasticsearch client on Elasticsearch{nodes=[http://127.0.0.1:9200], index='engineeringtest', indexFolder='engineeringtest_folder', bulkSize=100, flushInterval=5s, byteSize=10mb, username='elastic', pipeline='null', pathPrefix='null', sslVerification='true'}. Message: Can not execute GET http://127.0.0.1:9200/ : Connection refused: connect.
09:47:09,472 FATAL [f.p.e.c.f.c.FsCrawlerCli] We can not start Elasticsearch Client. Exiting.
fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClientException: Can not execute GET http://127.0.0.1:9200/ : Connection refused: connect
        at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.httpCall(ElasticsearchClient.java:779) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.httpGet(ElasticsearchClient.java:732) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.getVersion(ElasticsearchClient.java:234) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.start(ElasticsearchClient.java:197) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.service.FsCrawlerManagementServiceElasticsearchImpl.start(FsCrawlerManagementServiceElasticsearchImpl.java:65) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.start(FsCrawlerImpl.java:116) ~[fscrawler-core-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.startEsClient(FsCrawlerCli.java:322) ~[fscrawler-cli-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:298) ~[fscrawler-cli-2.10-SNAPSHOT.jar:?]
Caused by: jakarta.ws.rs.ProcessingException: java.net.ConnectException: Connection refused: connect
        at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:270) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:297) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$1(JerseyInvocation.java:675) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.client.JerseyInvocation.call(JerseyInvocation.java:697) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.client.JerseyInvocation.lambda$runInScope$3(JerseyInvocation.java:691) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:292) ~[jersey-common-3.0.8.jar:?]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:274) ~[jersey-common-3.0.8.jar:?]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:205) ~[jersey-common-3.0.8.jar:?]
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390) ~[jersey-common-3.0.8.jar:?]
        at org.glassfish.jersey.client.JerseyInvocation.runInScope(JerseyInvocation.java:691) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:674) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:422) ~[jersey-client-3.0.8.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.httpCall(ElasticsearchClient.java:753) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
        ... 7 more
Caused by: java.net.ConnectException: Connection refused: connect
        at sun.nio.ch.Net.connect0(Native Method) ~[?:?]
        at sun.nio.ch.Net.connect(Net.java:579) ~[?:?]
        at sun.nio.ch.Net.connect(Net.java:568) ~[?:?]
        at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:576) ~[?:?]
        at java.net.Socket.connect(Socket.java:666) ~[?:?]
        at sun.net.NetworkClient.doConnect(NetworkClient.java:178) ~[?:?]
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:531) ~[?:?]
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:636) ~[?:?]
        at sun.net.www.http.HttpClient.<init>(HttpClient.java:279) ~[?:?]
        at sun.net.www.http.HttpClient.New(HttpClient.java:384) ~[?:?]
        at sun.net.www.http.HttpClient.New(HttpClient.java:406) ~[?:?]
        at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1308) ~[?:?]
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1241) ~[?:?]
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1127) ~[?:?]
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1056) ~[?:?]
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1661) ~[?:?]
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1585) ~[?:?]
        at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:529) ~[?:?]
        at org.glassfish.jersey.client.internal.HttpUrlConnector._apply(HttpUrlConnector.java:380) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:268) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:297) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$1(JerseyInvocation.java:675) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.client.JerseyInvocation.call(JerseyInvocation.java:697) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.client.JerseyInvocation.lambda$runInScope$3(JerseyInvocation.java:691) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:292) ~[jersey-common-3.0.8.jar:?]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:274) ~[jersey-common-3.0.8.jar:?]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:205) ~[jersey-common-3.0.8.jar:?]
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:390) ~[jersey-common-3.0.8.jar:?]
        at org.glassfish.jersey.client.JerseyInvocation.runInScope(JerseyInvocation.java:691) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:674) ~[jersey-client-3.0.8.jar:?]
        at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:422) ~[jersey-client-3.0.8.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.httpCall(ElasticsearchClient.java:753) ~[fscrawler-elasticsearch-client-2.10-SNAPSHOT.jar:?]
        ... 7 more
09:47:09,493 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [engineeringtest]
09:47:09,497 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing Elasticsearch client manager
09:47:09,508 DEBUG [f.p.e.c.f.s.FsCrawlerManagementServiceElasticsearchImpl] Elasticsearch Management Service stopped
09:47:09,509 DEBUG [f.p.e.c.f.c.WorkplaceSearchClient] Closing Workplace Search client
09:47:09,509 DEBUG [f.p.e.c.f.c.WorkplaceSearchClient] Workplace Search client closed
09:47:09,510 DEBUG [f.p.e.c.f.s.FsCrawlerDocumentServiceWorkplaceSearchImpl] Workplace Search Document Service stopped
09:47:09,511 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
09:47:09,511 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [engineeringtest] stopped
09:47:09,515 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [engineeringtest]

Just as a note, we also provide this connector package for Workplace Search if you're interested in trying this out. Network drives connector package | Workplace Search Guide [8.5] | Elastic

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.