We're in the process of finally migrating from ES 1.7, and I noticed a strange inconsistency between the hashed values of Murmur3 fields between the two versions. Specifically, the hashes do not match in the last byte. This is particularly strange because I've diffed the MurmurHash3.java file in 1.7 and 6.4 and found only a minor formatting difference.
Furthermore, I have called the MurmurHash3 code directly in testing and found that it produces results that match the C++ reference implementation, but the stored value in ES 1.7 is slightly off. The value in ES 6.4 matches what is expected.
For example, given the string
"db030357-7a16-41c0-b69a-02a12299f90f", the output of calling
hash128 directly is
-1884620459626981620, but the stored hashed value in 1.7 is
-1884620459626981600 (note the
00 at the end instead of
20). The stored hashed value in 6.4 matches the expected.
Since we did not store
_source on hashed fields in 1.7, this presents a bit of a problem. We were hoping to just convert the fields to
long for 6.4 and do a simple reindex (with some massaging to extract the
fielddata_fields and reformat them), but we need to be able to do the hashing in a consistent way in the future.
If anyone knows what is actually changing the values slightly in ES 1.7, I can then replicate that difference in our code that writes new data to the new 6.4 indices. I'll keep digging around, but there is nothing obviously modifying the value where
hash128 is called.
I should also note it's not off by a constant value of -20. I've seen differences of up to ±255, which is why I think it's some corruption of the final byte.
One other note: I have verified that the difference is consistent across multiple ES 1.7 clusters for the same inputs. That is, the value stored in 1.7 for the input string above is always