New User - too dumb to create first index - please help

neergttocsdivad · January 3, 2018, 2:12am

I would like to experiment with elastic before using it for a research project I am developing. The project data is contained in unstructured pdf files, but for this exercise I am using Wikipedia.

My computer is running Windows 10.

I have downloaded wikipedia as html files into a folder called 'en.wikipedia.org' on a usb drive called 'brown3' connected to a synology NAS drive called 'synology2'.

The path to my folder is therefore \synology2\brown3\en.wikipedia.org

I have installed kibana 6 with the ingest plugin

I have read countless pages on elastic.co and viewed loads of youtube videos but have been unable to translate their general advice to my specific needs and so I am still unable to get elastic/kibana to index my files.

Though the text on wiki pages is organized using headings, sub-headings, numbered lists and bulleted lists, I want all the text to be indexed as just text, to match my unstructured pdf files.

I would greatly appreciate your help.

Regards
David

warkolm · January 3, 2018, 6:18am

If I can suggest, you're probably better off starting simpler. Check out https://www.elastic.co/guide/en/kibana/current/getting-started.html.

Because otherwise you will need a way to read the files off disk, process them how you want and index them to Elasticsearch. It sounds simple but it's not given your first starting.

dadoonet · January 3, 2018, 6:27am

You can also give a look at FSCrawler project: https://github.com/dadoonet/fscrawler

But as Mark said, start with something even more easy.

neergttocsdivad · January 13, 2018, 6:07am

Thank you both.

I've been reading the FScrawler web page.

My operating system is on my computer's C drive. My pdf files (to be indexed) are on one NAS. I would like my index to be created on another NAS.

Kibana is stored in c:/kibana

Where should I place my FScrawler snapshot ?

If on the c. drive then how/where to specify the folder containing my pdf files ?

Thanks

dadoonet · January 13, 2018, 4:02pm

Where should I place my FScrawler snapshot ?

Wherever you want.

how/where to specify the folder containing my pdf files ?

See GitHub - dadoonet/fscrawler: Elasticsearch File System Crawler (FS Crawler)

system · February 10, 2018, 4:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing many pdf files Elasticsearch	12	8398	June 16, 2018
Indexing pages from local Elasticsearch	8	744	July 5, 2017
Indexing word, pdf documents? Elasticsearch	12	6922	July 7, 2020
Indexing pdf and word files Elasticsearch	2	421	March 13, 2019
Index PDF in ES Elasticsearch	14	9216	April 24, 2017

New User - too dumb to create first index - please help

Related topics