I've upgraded hardware. Clean install of ES 7.3. Importing the same static dataset I was using on my old ES 6 box. Importing on 7 is faster BUT... after completing the import I'm running a _forcemerge on the ~1.1TB dataset. On 6 this runs for some time, slowly but surely decreasing segments count and disk space used. On 7 there're an issue: the same command on the same dataset already runs for quite some time and the segment count is slowly INCREASING and disk usages of the index already more than doubled (from 1.1TB to >2.2 TB).
Any clues about what's going on and why ES / Lucene is using >100% temp storage? Is there a workaround to not use all this temp storage?
Are you still indexing into this index while the force-merge is running, or did you import everything, wait for that to complete, and then request a force-merge? What exactly was the request you made to the force-merge API? Was the cluster health green throughout?
Update: merge process continued until 100% of disk space was taken. At that time the merge took > 2x index size(!). The _forcemerge however seems to be successful ("failed" : 0). Ran _flush and _refresh, waited a few minutes. Result: number of segments in the end went DOWN to a value lower than requested (I used rather high value instead of 1 to see if it would work anyway), disk space back to normal. Hmmm, that's pretty strange... Perhaps the process was already finished before but kept on writing data for a magic reason? Is there a way I can provide you with debug information?
Something that caught my eye in the storage directory: I see large files with both namens like Lucene50 and Lucene80. Seems like v5 and v8 files for an amateur like me, where I expect v8 only. Is this correct? The system contains a clean ES 7.3 install, clean generation of the index (no upgrades / re-indexing from older versions). Other info: using best_compression and large ngram tokenizers.
Thanks for the update. It is possible that this is related to https://github.com/elastic/elasticsearch/pull/46066 in which Elasticsearch isn't always as enthusiastic about flushing as perhaps it should be. Did you try flushing while the force-merge was ongoing too, or only at the end? Can you wait for the release of 7.4.0 and then try again?
I will check, but I don't think this is something to worry about.
If that works then it seems like a reasonable workaround. We'd like to know if it does work, because if it doesn't then you might have hit a different (and as-yet-unknown) issue instead.
If that works then it seems like a reasonable workaround. We'd like to know if it does work, because if it doesn't then you might have hit a different (and as-yet-unknown) issue instead.
I've started a new run, using a smaller dataset, for faster results. Similar behavior: segment count slowly grows (difficult to see on the screendump because of the huge drop, but still the case) and disk usage significantly increased over the hours. Then your suggested workaround (flush and refresh), and...
Great, thanks for reporting back. I also expect 7.4 contains a fix for this, yes, but please let us know if the problem isn't fixed there and we'll investigate further.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.