Hi David, to the rescue again... I appreciate it.
11:54:52,219 INFO [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
11:54:52,219 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler started in watch mode. It will run unless you stop it with CTRL+C.
11:54:52,815 WARN [o.e.c.RestClient] request [GET http://127.0.0.1:9200/] returned 1 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:54:52,836 WARN [o.e.c.RestClient] request [GET http://127.0.0.1:9200/] returned 1 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:54:52,839 WARN [o.e.c.RestClient] request [GET http://127.0.0.1:9200/] returned 1 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:54:52,840 INFO [f.p.e.c.f.c.v.ElasticsearchClientV7] Elasticsearch Client for version 7.x connected to a node running version 7.17.1
11:54:53,009 WARN [o.e.c.RestClient] request [GET http://127.0.0.1:9200/] returned 1 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:54:53,013 WARN [o.e.c.RestClient] request [GET http://127.0.0.1:9200/] returned 1 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:54:53,015 WARN [o.e.c.RestClient] request [GET http://127.0.0.1:9200/] returned 1 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:54:53,016 INFO [f.p.e.c.f.c.v.ElasticsearchClientV7] Elasticsearch Client for version 7.x connected to a node running version 7.17.1
11:54:53,019 WARN [o.e.c.RestClient] request [GET http://127.0.0.1:9200/] returned 1 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:54:53,330 WARN [o.e.c.RestClient] request [PUT http://127.0.0.1:9200/ocr_testing?master_timeout=30s&timeout=30s] returned 2 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Camel case format name dateOptionalTime is deprecated and will be removed in a future version. Use snake case name date_optional_time instead."],[299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:54:53,338 WARN [o.e.c.RestClient] request [GET http://127.0.0.1:9200/_cluster/health/ocr_testing?master_timeout=30s&level=cluster&timeout=30s&wait_for_status=yellow] returned 1 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:54:53,531 WARN [o.e.c.RestClient] request [PUT http://127.0.0.1:9200/ocr_testing_folder?master_timeout=30s&timeout=30s] returned 1 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:54:53,536 WARN [o.e.c.RestClient] request [GET http://127.0.0.1:9200/_cluster/health/ocr_testing_folder?master_timeout=30s&level=cluster&timeout=30s&wait_for_status=yellow] returned 1 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:54:53,541 INFO [f.p.e.c.f.FsParserAbstract] FS crawler started for [ocr_testing] for [D:\OCRTESTING] every [1m]
11:54:53,543 WARN [o.e.c.RestClient] request [GET http://127.0.0.1:9200/] returned 1 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:54:53,668 INFO [f.p.e.c.f.t.TikaInstance] OCR is enabled. This might slowdown the process.
11:54:54,365 WARN [f.p.e.c.f.t.TikaDocParser] Failed to extract [100000] characters of text for [D:\OCRTESTING\2A32-WHB-001.pdf]: Unable to extract PDF content -> Unable to end a page -> I regret that I couldn't find an OCR parser to handle image/ocr-png.Please set the OCR_STRATEGY to NO_OCR or configure yourOCR parser correctly
11:54:55,525 WARN [f.p.e.c.f.t.TikaDocParser] Failed to extract [100000] characters of text for [D:\OCRTESTING\616318-P79810-0023 - Red Marked.pdf]: Unable to extract PDF content -> Unable to end a page -> I regret that I couldn't find an OCR parser to handle image/ocr-png.Please set the OCR_STRATEGY to NO_OCR or configure yourOCR parser correctly
11:54:55,695 WARN [f.p.e.c.f.t.TikaDocParser] Failed to extract [100000] characters of text for [D:\OCRTESTING\616318-P79810-0031 - Red Marked.pdf]: Unable to extract PDF content -> Unable to end a page -> I regret that I couldn't find an OCR parser to handle image/ocr-png.Please set the OCR_STRATEGY to NO_OCR or configure yourOCR parser correctly
11:54:55,770 WARN [f.p.e.c.f.t.TikaDocParser] Failed to extract [100000] characters of text for [D:\OCRTESTING\616318-P79810-0037.pdf]: Unable to extract PDF content -> Unable to end a page -> I regret that I couldn't find an OCR parser to handle image/ocr-png.Please set the OCR_STRATEGY to NO_OCR or configure yourOCR parser correctly
11:54:55,824 WARN [f.p.e.c.f.t.TikaDocParser] Failed to extract [100000] characters of text for [D:\OCRTESTING\616318-P79810-0039.pdf]: Unable to extract PDF content -> Unable to end a page -> I regret that I couldn't find an OCR parser to handle image/ocr-png.Please set the OCR_STRATEGY to NO_OCR or configure yourOCR parser correctly
11:54:55,879 WARN [f.p.e.c.f.t.TikaDocParser] Failed to extract [100000] characters of text for [D:\OCRTESTING\616318-P79810-0043.pdf]: Unable to extract PDF content -> Unable to end a page -> I regret that I couldn't find an OCR parser to handle image/ocr-png.Please set the OCR_STRATEGY to NO_OCR or configure yourOCR parser correctly
11:54:55,893 WARN [f.p.e.c.f.t.TikaDocParser] Failed to extract [100000] characters of text for [D:\OCRTESTING\Exception list - HP STEAM LINE - .pdf]: Unable to extract PDF content -> Unable to end a page -> I regret that I couldn't find an OCR parser to handle image/ocr-png.Please set the OCR_STRATEGY to NO_OCR or configure yourOCR parser correctly
11:54:55,999 WARN [f.p.e.c.f.t.TikaDocParser] Failed to extract [100000] characters of text for [D:\OCRTESTING\MD-512-TE-2015.pdf]: Unable to extract PDF content -> Unable to end a page -> I regret that I couldn't find an OCR parser to handle image/ocr-png.Please set the OCR_STRATEGY to NO_OCR or configure yourOCR parser correctly
11:54:56,034 WARN [f.p.e.c.f.t.TikaDocParser] Failed to extract [100000] characters of text for [D:\OCRTESTING\Office zoom in.pdf]: Unable to extract PDF content -> Unable to end a page -> I regret that I couldn't find an OCR parser to handle image/ocr-png.Please set the OCR_STRATEGY to NO_OCR or configure yourOCR parser correctly
11:54:56,074 WARN [o.e.c.RestClient] request [POST http://127.0.0.1:9200/ocr_testing/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true&ignore_throttled=false&search_type=query_then_fetch&batched_reduce_size=512] returned 2 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."],[299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
11:54:56,089 WARN [o.e.c.RestClient] request [POST http://127.0.0.1:9200/ocr_testing_folder/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true&ignore_throttled=false&search_type=query_then_fetch&batched_reduce_size=512] returned 2 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."],[299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
11:54:57,994 WARN [o.e.c.RestClient] request [POST http://127.0.0.1:9200/_bulk?timeout=1m] returned 1 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:54:58,040 WARN [o.e.c.RestClient] request [POST http://127.0.0.1:9200/_bulk?timeout=1m] returned 1 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."]
11:55:56,131 WARN [o.e.c.RestClient] request [POST http://127.0.0.1:9200/ocr_testing/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true&ignore_throttled=false&search_type=query_then_fetch&batched_reduce_size=512] returned 2 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."],[299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
11:55:56,145 WARN [o.e.c.RestClient] request [POST http://127.0.0.1:9200/ocr_testing_folder/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true&ignore_throttled=false&search_type=query_then_fetch&batched_reduce_size=512] returned 2 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."],[299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
11:56:56,177 WARN [o.e.c.RestClient] request [POST http://127.0.0.1:9200/ocr_testing/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true&ignore_throttled=false&search_type=query_then_fetch&batched_reduce_size=512] returned 2 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."],[299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
11:56:56,187 WARN [o.e.c.RestClient] request [POST http://127.0.0.1:9200/ocr_testing_folder/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=true&expand_wildcards=open&allow_no_indices=true&ignore_throttled=false&search_type=query_then_fetch&batched_reduce_size=512] returned 2 warnings: [299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security."],[299 Elasticsearch-7.17.1-e5acb99f822233d62d6444ce45a4543dc1c8059a "[ignore_throttled] parameter is deprecated because frozen indices have been deprecated. Consider cold or frozen tiers in place of frozen indices."]
_settings.yml
---
name: "ocr_testing"
fs:
url: "D:\\OCRTESTING"
update_rate: "1m"
excludes:
- "*/~*"
json_support: false
filename_as_id: false
add_filesize: true
remove_deleted: true
add_as_inner_object: false
store_source: false
index_content: true
attributes_support: false
raw_metadata: false
xml_support: false
index_folders: true
lang_detect: false
continue_on_error: false
ocr:
language: "eng"
enabled: true
pdf_strategy: "ocr_and_text"
path: "D:\\tesseract"
data_path: "D:\\tesseract\\tessdata"
output_type: "txt"
follow_symlinks: false
elasticsearch:
nodes:
- url: "http://127.0.0.1:9200"
bulk_size: 100
flush_interval: "5s"
byte_size: "10mb"
ssl_verification: true
There doesn't seem to be any extra information, .\bin\fscrawler ocr_testing --debug
I can't find any log file either: config/log4j2.xml
_status.json is also empty, same for the other indices not sure if that's an issue.
{
"name" : "ocr_testing",
"lastrun" : "2022-03-12T12:05:54.4408395",
"indexed" : 0,
"deleted" : 0
}
I threw a png in there to kick it off but it wouldn't work so deleted the index and remade it again using fscrawler.
I've tried restarting elastic also, could it be something else unrelated to this like a PC setting or something? Something installed or not installed? Environment settings? java versions, python versions?