Using python plugin to create zentity models in elastic

davemoore · May 17, 2019, 9:17pm

You're right that zentity was designed to resolve a single entity per request in real-time. This contrasts with the more common approach of resolving a population of entities in batch. I made a brief comparison of the two approaches in this presentation (Slide 13).

At some level it would be possible to use zentity to resolve a population of entities. For example, you could scroll every document in an index, resolve the document with others using zentity, associate each document _id from the hits with an entity ID that you generate, and exclude each document _id from subsequent iterations of this batch process. But this approach has limited scalability. The list of excluded document identifiers will grow unbounded with each request, and omitting those exclusions will results in many redundant searches. There are more appropriate solutions for population scale entity resolution that operate in batch, but none are open source as far as I'm aware.

I view zentity as an appropriate solution in two cases:

When the scope of your analysis is limited to a single entity or a small network of entities from the greater population, and you want to simplify your architecture by skipping batch entity resolution; or
When you have resolved a population of entities in batch and then want to resolve subsequent incoming entities in real-time.

Topic		Replies	Views
Looking for a Consultant to Write ES Plugin Elasticsearch	1	321	March 31, 2020
ANN : elasticsearch-entity-resolution plugin v0.2 Elasticsearch	3	355	July 6, 2017
Old plugin migration Elasticsearch	1	298	June 24, 2020
Elastic plugin with pyton/spaCy Elasticsearch	3	651	January 4, 2021
Probabilistic Record Linkage using Elastic Search Elasticsearch	8	3482	July 5, 2017

Using python plugin to create zentity models in elastic

Related topics