I'm trying to ingest multiple extension files like (.pdf, .doc, .docx, .csv) using FSCrawler in Elasticsearch . After ingesting, all the details inside the files mapped under one key value pair called content.
Is there any option or any other tool to ingest multiple extensions files with multiple key value pairs mapped based on delimeter (:, ; )?
Hi David, Thanks for the reply!!!
Here is my Use-Case: Resume Analytics using Full Search Query in Elastic Search
I'm trying to import resumes (in different extensions - .pdf, .doc, .docx) into elastic search using FS Crawler.
Successfully ingested all the resume's into elastic search with FS Crawler.
Example: I'm trying to check the XXYY.pdf resume, while checking I have noticed that all the details ( name , mail-id, mobile number, experience , summary ) inside XXYY.pdf are mapped under one key "Content" .
I'm looking to parse the details separately as multiple keys for name, mail-id, mobile, experience and summary instead of having all the details in one key (content).
Is it possible, to parse the details in FS Crawler based on ( using FS Crawler?
Pls suggest, am I missing something? or do I need to parse the resume before ingesting into FS Crawler?
No. FSCrawler can not recognize and extract entities from a text.
That's a process you'd need to run on the content field.
May be you can use something like:
In an ingest pipeline and configure this pipeline in FSCrawler.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.