New User - too dumb to create first index - please help

I would like to experiment with elastic before using it for a research project I am developing. The project data is contained in unstructured pdf files, but for this exercise I am using Wikipedia.

My computer is running Windows 10.

I have downloaded wikipedia as html files into a folder called 'en.wikipedia.org' on a usb drive called 'brown3' connected to a synology NAS drive called 'synology2'.

The path to my folder is therefore \synology2\brown3\en.wikipedia.org

I have installed kibana 6 with the ingest plugin

I have read countless pages on elastic.co and viewed loads of youtube videos but have been unable to translate their general advice to my specific needs and so I am still unable to get elastic/kibana to index my files.

Though the text on wiki pages is organized using headings, sub-headings, numbered lists and bulleted lists, I want all the text to be indexed as just text, to match my unstructured pdf files.

I would greatly appreciate your help.

Regards
David

If I can suggest, you're probably better off starting simpler. Check out https://www.elastic.co/guide/en/kibana/current/getting-started.html.

Because otherwise you will need a way to read the files off disk, process them how you want and index them to Elasticsearch. It sounds simple but it's not given your first starting.

You can also give a look at FSCrawler project: https://github.com/dadoonet/fscrawler

But as Mark said, start with something even more easy. :wink:

Thank you both.

I've been reading the FScrawler web page.

My operating system is on my computer's C drive. My pdf files (to be indexed) are on one NAS. I would like my index to be created on another NAS.

Kibana is stored in c:/kibana

Where should I place my FScrawler snapshot ?

If on the c. drive then how/where to specify the folder containing my pdf files ?

Thanks

Where should I place my FScrawler snapshot ?

Wherever you want.

how/where to specify the folder containing my pdf files ?

See GitHub - dadoonet/fscrawler: Elasticsearch File System Crawler (FS Crawler)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.