I started experimenting with the zentity plugin yesterday in order to see if we can use it to solve some of our entity resolution problems (instead of building our own custom software to do the same).
I find that I am getting a pretty high error rate, on an index with about 13 million entries, and a model with a single resolver that looks at name and phone number. Iterating through a test set of amounting to 1000 records totals where each record has a name and phone number, I get these exceptions thrown:
org.elasticsearch.ElasticsearchException$1: maxClauseCount is set to 1024 at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:639) ~[elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:137) [elasticsearch-7.3.2.jar:7.3.2] at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:264) [elasticsearch-7.3.2.jar:7.3.2]
What's worse, anytime these errors are thrown, it takes around 10-30 seconds to resolve itself, which makes it too slow for processing the full data set (around 70k entries).
Just before the exception, the console dumps part of the query to stderr and it looks like a giant query with all of the different phone numbers in the index.
Is there something I can do to prevent this from happening? Is this a result of something I have configured incorrectly?
- Elasticsearch 7.3.2
- 4gb Heap (on a 16gb machine)