Hi all, I am trying to set up a file searching capability for a not for profit org I am helping out.
The use case is something like this :
*** users drop documents (mainly doc files such as pdf, txt, xls, word but also some audio, video & image files) in a share folder. **
*** crawl the folder every few minutes and index the files for ES to search**
*** users to access search through web interface**
here is what I tried;
a) set up ES and kibana
b) use fscrawler 2.3 to crawl the folder and create the index.
while it does work, I faced a couple of problems, viz.
a) ES did not run on win7 due to some java version issue which I could not resolve (will create another thread for it). luckily I had access to a server running windows server 2012 where I got ES running, however that means I have only one node working for ES
b) I tested on a sample folder with about 15,000 files (~ 200-300 GB of data). unfortunately the performance was a little underwhelming. when I try to access the index through kibana, it takes about 10 seconds to load !
GIven that the target data I am supposed to work with has a volume of about 2-3 TB's and growing at about 2-3 GB/week, it doesn't look good.
I suspect the number of fields being created in the index (900+) might be a factor in slowing down the retrieval. is it ?
If so, I am quite ready to strip down the number of fields to a bare minimum, because I dont need most of them.
sorry for the long post, hoping for some Yoda-wisdom from the community !