I would like to share the steps that I followed for the POC. So that It would be helpful to anyone who would explore the Elasticsearch as a beginner.
Steps used to achieve the POC:
Search for the CVs(PDF or Word file which resides in One drive or local) and search for anything in the content using Kibana. For example location worked or the previous company,etc.,
- Install JDB 1.8
- Set Java home path
https://www.elastic.co/downloads - to download the servers
- Start the Elastic search server
- Start the Kibana server
- Verify once the servers are started using the below link
Link - http://localhost:5601
https://fscrawler.readthedocs.io/en/latest/installation.html - to download the server
- open command prompt and navigate to the fscrawler folder, then type - .\bin\fscrawler job1
- It will ask whether we can create "Do you want to create it (Y/N)?" - type "Y"
- Now we have to change the configuration of the folder to read the files
For example, Navigate to the folder "C:\Users\jesumanij.fscrawler\job1_settings.yaml" and edit the below.
Old : url: "\tmp\es"
New : url: "C:\Users\jesumanij\CV" (don't use the desktop)
Make sure the above folder exists and paste all the files( in our case all CVs) inside the above location
Now again start the FSCrawler with the same command
Create Index pattern:
Kibana-> Management ->Index Patterns -> Create index pattern -> type "job1"(the same keyword we used while starting the FSCrawler) in the index pattern input -> Click Next -> Choose "file.created" and click "Create index pattern"
Search for the CVs:
Click the drop down in the left and choose "job1"
make sure right top is having the value "Year to date" to show all date since beginning
Then we can add the below available fields on the left hand side based on the requirement
content, file.filename, file.extension, file.url, file.filesize, etc.,
Refresh the files in the Folder to be available for search:
- add the new files in the location (in our case its "C:\Users\jesumanij\CV" )
- It will take 15 minutes for auto refresh in FSCrawler server
- After 15 minutes, we can click the refresh button in Kibana and check the updated file whether its available for search