New User - too dumb to create first index - please help


(David) #1

I would like to experiment with elastic before using it for a research project I am developing. The project data is contained in unstructured pdf files, but for this exercise I am using Wikipedia.

My computer is running Windows 10.

I have downloaded wikipedia as html files into a folder called 'en.wikipedia.org' on a usb drive called 'brown3' connected to a synology NAS drive called 'synology2'.

The path to my folder is therefore \synology2\brown3\en.wikipedia.org

I have installed kibana 6 with the ingest plugin

I have read countless pages on elastic.co and viewed loads of youtube videos but have been unable to translate their general advice to my specific needs and so I am still unable to get elastic/kibana to index my files.

Though the text on wiki pages is organized using headings, sub-headings, numbered lists and bulleted lists, I want all the text to be indexed as just text, to match my unstructured pdf files.

I would greatly appreciate your help.

Regards
David


(Mark Walkom) #2

If I can suggest, you're probably better off starting simpler. Check out https://www.elastic.co/guide/en/kibana/current/getting-started.html.

Because otherwise you will need a way to read the files off disk, process them how you want and index them to Elasticsearch. It sounds simple but it's not given your first starting.


(David Pilato) #3

You can also give a look at FSCrawler project: https://github.com/dadoonet/fscrawler

But as Mark said, start with something even more easy. :wink:


(David) #4

Thank you both.

I've been reading the FScrawler web page.

My operating system is on my computer's C drive. My pdf files (to be indexed) are on one NAS. I would like my index to be created on another NAS.

Kibana is stored in c:/kibana

Where should I place my FScrawler snapshot ?

If on the c. drive then how/where to specify the folder containing my pdf files ?

Thanks


(David Pilato) #5

Where should I place my FScrawler snapshot ?

Wherever you want.

how/where to specify the folder containing my pdf files ?

See https://github.com/dadoonet/fscrawler#root-directory