I would like to experiment with elastic before using it for a research project I am developing. The project data is contained in unstructured pdf files, but for this exercise I am using Wikipedia.
My computer is running Windows 10.
I have downloaded wikipedia as html files into a folder called 'en.wikipedia.org' on a usb drive called 'brown3' connected to a synology NAS drive called 'synology2'.
The path to my folder is therefore \synology2\brown3\en.wikipedia.org
I have installed kibana 6 with the ingest plugin
I have read countless pages on elastic.co and viewed loads of youtube videos but have been unable to translate their general advice to my specific needs and so I am still unable to get elastic/kibana to index my files.
Though the text on wiki pages is organized using headings, sub-headings, numbered lists and bulleted lists, I want all the text to be indexed as just text, to match my unstructured pdf files.
Because otherwise you will need a way to read the files off disk, process them how you want and index them to Elasticsearch. It sounds simple but it's not given your first starting.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.