hello ,
I want to know how can I index a large file I want the whole content to be added , I already tried this : "_indexed_chars": "100%" but it dosen't work
thank you
What format is the large file?
How would you like to make use of it once it is indexed in Elasticsearch?
the format is : 3 287 ko I want to index it to elasticsearch with all of its content
What are the logs of FSCrawler please?
th file is bieng indexed but not Not all of the content makes it into elasticsearch .
what do you mean by logs where I can find them if you are talking about my setting there is :
{
"name" : "test4",
"fs": {
"url": "C:\Users\aelkhattabi\Desktop\elastic\testindex",
"update_rate": "15m",
"excludes": [ "~*" ],
"json_support": false,
"filename_as_id": false,
"add_filesize": true,
"remove_deleted": true,
"add_as_inner_object": false,
"store_source": false,
"_indexed_chars": "100%",
"index_content": true,
"attributes_support": false,
"raw_metadata": true,
"xml_support": false,
"index_folders": true,
"lang_detect": false,
"continue_on_error": false,
"pdf_ocr": true,
"ocr": {
"language": "eng"
}
},
"elasticsearch" : {
"nodes" : [ {
"host" : "127.0.0.1",
"port" : 9200,
"scheme" : "HTTP"
} ],
"bulk_size" : 100,
"flush_interval" : "5s"
},
"rest" : {
"scheme" : "HTTP",
"host" : "127.0.0.1",
"port" : 8080,
"endpoint" : "fscrawler"
}
}
th file is bieng indexed but not Not all of the content makes it into elasticsearch.
Can you share what the indexed document looks like please?
what do you mean by logs where I can find them
This is the FSCrawler output.
Can you share the document you are trying to index?
So your document has been indexed. I can see document content extracted in the content
field. So what is the problem?
the problem is it doesn't index the whole content , maybe because its a large file
I see.
Change
"_indexed_chars": "100%",
By:
"indexed_chars": "100%",
It worked for me with "indexed_chars": "100%"
thank you a lot
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.