Is there any option to define multiple key value pairs based on delimeter while ingesting files through FSCrawler in Elasticsearch?

I'm trying to ingest multiple extension files like (.pdf, .doc, .docx, .csv) using FSCrawler in Elasticsearch . After ingesting, all the details inside the files mapped under one key value pair called content.

Is there any option or any other tool to ingest multiple extensions files with multiple key value pairs mapped based on delimeter (:, ; )?


What do you mean? Do you have a concrete example of a document indexed by FSCrawler and what it should look like to fit your use case?

Hi David, Thanks for the reply!!!
Here is my Use-Case: Resume Analytics using Full Search Query in Elastic Search

  1. I'm trying to import resumes (in different extensions - .pdf, .doc, .docx) into elastic search using FS Crawler.
  2. Successfully ingested all the resume's into elastic search with FS Crawler.
  3. Example: I'm trying to check the XXYY.pdf resume, while checking I have noticed that all the details ( name , mail-id, mobile number, experience , summary ) inside XXYY.pdf are mapped under one key "Content" .
  4. I'm looking to parse the details separately as multiple keys for name, mail-id, mobile, experience and summary instead of having all the details in one key (content).
  5. Is it possible, to parse the details in FS Crawler based on (:slight_smile: using FS Crawler?
  6. Pls suggest, am I missing something? or do I need to parse the resume before ingesting into FS Crawler?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Sorry for the late response.

I see.

No. FSCrawler can not recognize and extract entities from a text.
That's a process you'd need to run on the content field.
May be you can use something like:

In an ingest pipeline and configure this pipeline in FSCrawler.