We're in the process of finally migrating from ES 1.7, and I noticed a strange inconsistency between the hashed values of Murmur3 fields between the two versions. Specifically, the hashes do not match in the last byte. This is particularly strange because I've diffed the MurmurHash3.java file in 1.7 and 6.4 and found only a minor formatting difference.
Furthermore, I have called the MurmurHash3 code directly in testing and found that it produces results that match the C++ reference implementation, but the stored value in ES 1.7 is slightly off. The value in ES 6.4 matches what is expected.
For example, given the string "db030357-7a16-41c0-b69a-02a12299f90f"
, the output of calling hash128
directly is -1884620459626981620
, but the stored hashed value in 1.7 is -1884620459626981600
(note the 00
at the end instead of 20
). The stored hashed value in 6.4 matches the expected.
Since we did not store _source
on hashed fields in 1.7, this presents a bit of a problem. We were hoping to just convert the fields to long
for 6.4 and do a simple reindex (with some massaging to extract the fielddata_fields
and reformat them), but we need to be able to do the hashing in a consistent way in the future.
If anyone knows what is actually changing the values slightly in ES 1.7, I can then replicate that difference in our code that writes new data to the new 6.4 indices. I'll keep digging around, but there is nothing obviously modifying the value where hash128
is called.
I should also note it's not off by a constant value of -20. I've seen differences of up to ±255, which is why I think it's some corruption of the final byte.
One other note: I have verified that the difference is consistent across multiple ES 1.7 clusters for the same inputs. That is, the value stored in 1.7 for the input string above is always -1884620459626981600
.