Fuzzy Hash plugins (Java API for custom types, mappings?)

I'm interested in developing plugins for fuzzy hashes, including ssdeep, sdhash, and LZJD. I did find something similar, an existing ssdeep plugin for Elasticsearch in Python, but I'd prefer to write the code in Java, since it's likely faster, and some of the fuzzy hash code is in Java anyway.

I did find a plugin for Murmur3, which is similar. But I don't see how the API works for storing, querying with a hash to get similarity, and how to return similar documents and adding a new field which is the similarity value (integer between zero and 100).

I'm working on a project where I'd like to be able to find similar documents based on the fuzzy hash, and the documents are metadata for raw binary files. These fuzzy hashes would work for text documents too (ssdeep was developed to detect spam emails).

Is there an API document to handle custom fields, mappings, and adding to the search result JSON?

Hey,

I think your best bet is studying the murmur3 plugin, which implements a custom mapper. I am not a hundred percent sure what else is needed in your use-case and different to the murmur3 mapper, so some more information might be good.

But looking at the source and creating your own plugin from that as a base sounds like the best way to go forward, as there is no dedicated plugin documentation.

Also, other plugins might be a good thing to look at, see https://www.elastic.co/guide/en/elasticsearch/plugins/7.2/api.html

--Alex

The Murmur3 example doesn't demonstrate how to perform an action on the hash. The Murmur3FieldType. termQuery() function throws an exception saying the field isn't searchable. I'd like to be able to query with an SSDeep or similar hash, have Elasticsearch perform the similarity comparison between the provided SSDeep hash and the SSDeep hash in each document in the index, and provide the document and similarity score for each document where the score is greater than zero.

I did find this old SSDeep plugin, which says it's for Elasticsearch 2. What's the difference between Script and Plugin?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.