I run a script that checks the status of processing by comparing the file offset in the registry log.json to the byte size on the OS.
The offset is generally saved in two places:
log.json -> multiple lines
/d/d/d/d/d/d/d.json -> once, this file name keeps changing os is re-created
I'm finding in some cases after a while the file no longer exists in log.json, only the filename of numbers. At first I thought the log.json would be moe accurate and the other file works as a transactional type log?
log.json is the registry file. Some files might be remove from the registry file if they are inactive for a long time. Maybe you are hitting this case?
Multiple filenames existed in 83932726.json and showed the completed offset, that did not exist in log.json. 6 hours later I simply appended a couple blank lines to an existing file, this appeared to trigger the log.json file to get updated, then the file appeared in both.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.