I have a use case where I have to compare resume documents (ms word, pdf) against job description. Since resumes are highly unstructured documents, I am struggling to clean up the documents, removing invalid characters and create a json.
Then I came across Ingest Attachment plugin. My questions are -
Is it possible to ingest the attachment directly from a physical drive location?
The attachment to be ingested should always be Base64 encoded? How should I query the encoded attachment data with non-encoded query string?
Thanks David. This tool was helpful. Though the resume text in the output json is has /n appended(understandably)
\nManaged global compliance program for a Fortune 100 company with an emphasis on creating an effective and cost-efficient program, talent development, and internal reporting and investigation procedures. \n\n· Allocated resources efficiently through the use of risk assessment protocols to identify, mitigate, and monitor compliance risks\n\n· Built cross-functional teams to develop compliance policies and communications, controls, audit plans, and training strategies that improve compliance program effectiveness\n\n· Reduced costs and employee time requirements of compliance training while improving employee awareness using internally developed training and communications resources\n·
How can I avoid those? This is because as skill keyword might get next to /n and may not come in search.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.