Get The id of the ES document created by FSCrawler

When FSCrawler sends a document to ES, i need to associate the _id of the doc created by FSCrawler with an internal ID in our DB.

How to get the _id of the doc created by FSCrawler each time a new document is being indexed?

Is there a way for FSCrawler to call a method in my code after a document is being indexed? and send to that method informations (e.g. _id,...) that are related to the newly indexed document.

Is it possible to specify to FSCrawler the _id of the doc that it will create in ES.

Thanks

Welcome!

One of the thing you could do is to use the REST Service of FSCrawler.

Is it possible to specify to FSCrawler the _id of the doc that it will create in ES.

Then you can manually set the id of the document (see REST service — FSCrawler 2.10-SNAPSHOT documentation).

How to get the _id of the doc created by FSCrawler each time a new document is being indexed?

Or read the response object sent by FSCrawler which looks like:

{
  "ok" : true,
  "filename" : "test.txt",
  "url" : "http://127.0.0.1:9200/fscrawler-rest-tests_doc/doc/dd18bf3a8ea2a3e53e2661c7fb53534"
}

Would that work for you?
But that means that you will have to do the "crawl" part by yourself as in that case, FSCrawler is "just" a gateway to elasticsearch.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.