Get The id of the ES document created by FSCrawler

Francois_Saab · July 16, 2021, 6:57pm

When FSCrawler sends a document to ES, i need to associate the _id of the doc created by FSCrawler with an internal ID in our DB.

How to get the _id of the doc created by FSCrawler each time a new document is being indexed?

Is there a way for FSCrawler to call a method in my code after a document is being indexed? and send to that method informations (e.g. _id,...) that are related to the newly indexed document.

Is it possible to specify to FSCrawler the _id of the doc that it will create in ES.

Thanks

dadoonet · July 22, 2021, 2:20pm

Welcome!

One of the thing you could do is to use the REST Service of FSCrawler.

Is it possible to specify to FSCrawler the _id of the doc that it will create in ES.

Then you can manually set the id of the document (see REST service — FSCrawler 2.10-SNAPSHOT documentation).

How to get the _id of the doc created by FSCrawler each time a new document is being indexed?

Or read the response object sent by FSCrawler which looks like:

{
  "ok" : true,
  "filename" : "test.txt",
  "url" : "http://127.0.0.1:9200/fscrawler-rest-tests_doc/doc/dd18bf3a8ea2a3e53e2661c7fb53534"
}

Would that work for you?
But that means that you will have to do the "crawl" part by yourself as in that case, FSCrawler is "just" a gateway to elasticsearch.

system · August 19, 2021, 2:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fsccrawler Document ID Elasticsearch	2	321	April 15, 2020
How to get the _id of a document easily? [C#] Elasticsearch language-clients	2	62	August 2, 2024
Pointing FSCrawler to a separate server for documents Elasticsearch	11	2188	November 24, 2017
Querying elasticsearch for doc id Logstash	4	1922	July 6, 2017
Fscrawler - update existing record Elasticsearch	1	376	January 10, 2022

Get The id of the ES document created by FSCrawler

Related topics