Fscrawler


(suresh) #1

HI
Actually i want to index the pdf files for that i am using fscrawler in elasticsearch 5.2 but getting an error

my _setting.json file is
{
"name" : "job_name",
"fs" : {
"url" : "d:/pdf/",
"update_rate" : "15m",
"includes": ["."],
"excludes" : ["*.json"],
"json_support" : false,
"filename_as_id" : false,
"add_filesize" : true,
"remove_deleted" : true,
"add_as_inner_object" : false,
"store_source" : true,
"index_content" : true,
"attributes_support" : false,
"raw_metadata" : true,
"xml_support" : false,
"index_folders" : true,
"lang_detect" : false,
"continue_on_error" : false,
"pdf_ocr" : true,
"ocr" : {
"language" : "eng"
}
},
"elasticsearch" : {
"nodes" : [ {
"host" : "127.0.0.1",
"port" : 9200,
"scheme" : "HTTP"
} ],
"bulk_size" : 100,
"flush_interval" : "5s",
"type" : "doc",
"pipeline" : "fscrawler"
},
"rest" : {
"scheme" : "HTTP",
"host" : "127.0.0.1",
"port" : 8080,
"endpoint" : "fscrawler"
}
}

But getting error

Please help me on this

Thanks


(David Pilato) #2

Please format your code using </> icon as explained in this guide. It will make your post more readable.

Or use markdown style like:

```
CODE
```

Don't post images. It's hardly readable.

Also, may be you can share somewhere (within a FSCrawler issue?) your sample document?

Thanks.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.