We are a group of students that uses ElasticSearch to search for possible matches between stored hash-codes (or numeric values) in an index and a query with a hash-code. As we are totally new to ElasticSearch, we were wondering what would be the best way to search for hash-codes or very long numeric values? At the moment we search through the following query and store the hash-code as a string:
(Please tell me, if you do not understand the lamda expressions/C# code)
var searchResponse = _client.Search(s => s
.Query(qu => qu
.Match(m => m
.Field(f => f.Fp)
This query does work and we do find the right matches, but it tends to be rather slow. Properly due to the length of the hash-code being normally between 40.000 - 110.000 digits. This yields one particular problem with our query other than being slow that the maxclausecount exceeds 1024. Changing this property results in extremely slow response times.
Please notice that we at the moment have around 200.000 fingerprints in a single index with a total size of ~50GB. Why we are asking is due to the fact that we are hitting performance issues.
The index has 5 shards and 1 replica, 30GB of ram with a heap size of 13 GB and a very fast 500 GB HDD.