PUT cvattachment/_doc/1?pipeline=attachment
{
"data": "JVBERi0xLjcNCiW1tbW1DQoxIDAgb2JqDQo8" ( the base64 data is very big. so just pasted the beginning of the converted data)
}
literally, I just want to replace the base64 to the code like below.
Ingest attachment can not do that. Specifically because you can't really know where the code will be running (on which node) so it won't have access to C:/xxx/xxx.pdf.
You can have a look at FSCrawler project which does a similar thing though.
I can able to place it in the root directory of the elastic server.
For example, In my case the local server is running in the below location.
C:\elasticsearch-6.6.1\bin\elasticsearch.bat
So I can place the documents inside any directories.
Is it possible to upload the pdf document in the above scenario?
I would like to share the steps that I followed for the POC. So that It would be helpful to anyone who would explore the Elasticsearch as a beginner.
Steps used to achieve the POC:
Title :
Search for the CVs(PDF or Word file which resides in One drive or local) and search for anything in the content using Kibana. For example location worked or the previous company,etc.,
open command prompt and navigate to the fscrawler folder, then type - .\bin\fscrawler job1
It will ask whether we can create "Do you want to create it (Y/N)?" - type "Y"
Now we have to change the configuration of the folder to read the files
For example, Navigate to the folder "C:\Users\jesumanij.fscrawler\job1_settings.yaml" and edit the below.
Old : url: "\tmp\es"
New : url: "C:\Users\jesumanij\CV" (don't use the desktop)
Make sure the above folder exists and paste all the files( in our case all CVs) inside the above location
Now again start the FSCrawler with the same command
.\bin\fscrawler job1
Create Index pattern:
Kibana-> Management ->Index Patterns -> Create index pattern -> type "job1"(the same keyword we used while starting the FSCrawler) in the index pattern input -> Click Next -> Choose "file.created" and click "Create index pattern"
Search for the CVs:
Kibana-> Discover
Click the drop down in the left and choose "job1"
make sure right top is having the value "Year to date" to show all date since beginning
Then we can add the below available fields on the left hand side based on the requirement
content, file.filename, file.extension, file.url, file.filesize, etc.,
Refresh the files in the Folder to be available for search:
add the new files in the location (in our case its "C:\Users\jesumanij\CV" )
It will take 15 minutes for auto refresh in FSCrawler server
After 15 minutes, we can click the refresh button in Kibana and check the updated file whether its available for search
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.