When FSCrawler sends a document to ES, i need to associate the _id of the doc created by FSCrawler with an internal ID in our DB.
How to get the _id of the doc created by FSCrawler each time a new document is being indexed?
Is there a way for FSCrawler to call a method in my code after a document is being indexed? and send to that method informations (e.g. _id,...) that are related to the newly indexed document.
Is it possible to specify to FSCrawler the _id of the doc that it will create in ES.
Would that work for you?
But that means that you will have to do the "crawl" part by yourself as in that case, FSCrawler is "just" a gateway to elasticsearch.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.