FIlesearch using fscrawler


(Ramon) #1

Hi all, this is with ref. to this now closed thread.

in a nutshell, I am trying to build a filesearch solution using fscrawler but faced a problem of explosion of meta.* fields.

First the good news, simply putting raw_metadata to 'false' in _settings json, takes care of the metadata issue.

However, I am trying to restrict the indexing to specific filetypes, something like

-
-
"includes" : [ "*.txt" ],
"includes" : [ "*.doc" ],
"includes" : [ "*.docx" ],
"includes" : [ "* .pdf" ],
"includes" : [ "*.rtf" ],
-
-

I find that the index only has files of the last file.extension (in this case *.rtf). I tried this multiple times with same result. is this a bug or am I doing something wrong ?

edit : hmm, the forum SW has removed all the ["*.filetype"] except the last one, I wonder why.


(David Pilato) #2

Could you try something like:

"includes" : [ "*.txt", "*.doc", "*.docx", "*.pdf", "*.rtf" ]

(Ramon) #3

didn't work. fsxrawler didn't run after throwing a java exception.

something like :
"... unexpected character ," at line... etc"

(sorry, I am on a different PC right now. and can't access a copy of the error msg)


(David Pilato) #4

Weird.

Could you try then:

"includes" : [ "*.txt,*.doc,*.docx,*.pdf,*.rtf" ]

(Ramon) #5

it runs but no files are indexed, only folder locations.

I think this looks for a file with the entire string inside " ".


(David Pilato) #6

Could you open an issue?


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.