Generate murmur3 (hyperloglog) hash outside of ES


#1

Hello,

Is there any way to provide murmur3 hash as part of document while indexing ? Here is my usecase and only reason I want to generate outside of ES is because of scale. Appreciate any hints.

I have..
user1 document1
user2 document2
user3 document1
user4 document1
user5 document1
user6 document1
......
......
......
4 Billion events

Group By document
document1: {user1, user3, user4, user5, user6, ...} (up to 100M users)
document2: {user2, ...}

For Each document generate
document1: murmur3 hash
document2: murmur3 hash

Index document1, document2

Now query which matches document1 and document2 and cardinality should return approximate distinct users based on murmur hash generated.

Thanks,
Jaikit


#2

@Adrien_Grand Could you please provide any tips on how I can generate murmur hash outside of ElasticSearch and later use that to find cardinality ?

Thanks in Advance.


#3

Reading more I found documentation on "Precomputed hashes" on client side but it does not mention how. Would really appreciate If anyone can point any reference or code sample on how to compute hash.

https://www.elastic.co/guide/en/elasticsearch/reference/2.0/search-aggregations-metrics-cardinality-aggregation.html#_pre_computed_hashes


(system) #4