hi,
I intend to perform OCR operations simultaneously (at the same time) in two or more languages (more than one language)
As you know, by the Config file for a Job, you can specify the type of OCR language (as below).
config file: ~/.fscrawler/job_name/_settings.json
{
"name" : "job_name",
"fs" : {
"url" : "/home/monitoring_files/",
"update_rate" : "30s",
"excludes" : [ "~*" ],
"json_support" : false,
"filename_as_id" : false,
"add_filesize" : true,
"remove_deleted" : true,
"add_as_inner_object" : false,
"store_source" : false,
"index_content" : true,
"attributes_support" : false,
"raw_metadata" : true,
"xml_support" : false
"index_folders" : true,
"lang_detect" : false,
"continue_on_error" : false,
"indexed_chars": "100%",
"pdf_ocr" : true,
"ocr" : {
"language" : "eng"
}
},
"elasticsearch" : {
"nodes" : [ {
"host" : "127.0.0.1",
"port" : 9200,
"scheme" : "HTTP"
} ],
"bulk_size" : 100,
"flush_interval" : "5s"
},
"rest" : {
"scheme" : "HTTP",
"host" : "127.0.0.1",
"port" : 8081,
"endpoint" : "fscrawler"
}
}
For one language (installed Tesseract Language pack), OCR worked correctly.
But what can we do to simultaneously do OCR for multiple languages?
Something like this:
"fs" : {
"ocr" : {
"language": "eng+fra"
}
}
OR this:
"fs" : {
"ocr" : {
"language": "eng"
},
"ocr" : {
"language": "fra"
}
}
Thanks ...