Using python plugin to create zentity models in elastic

Hi all,

I have come across the zentity plugin for entity resolution in ES, but since I am already using the python plugin, I wanted to ask whether the official python plugin in ES has any functionality to support entity model creation and resolution? I couldn't find any documentation on it so far, so any help is appreciated?

Thanks in advance

Hi @gioorso, I'm the maintainer of zentity.

The Elasticsearch Python client doesn't have functions for community plugins like zentity that extend the Elasticsearch API. However, you can use the client to interact with zentity using es.transport.perform_request(). This lets you submit any request, including requests to zentity, using the connections established by the client. You can request any of the zentity REST APIs by changing the method, url, and body arguments passed to es.transport.perform_request().

Here's an example that uses the Elasticsearch Python client to resolve an entity from the zentity sandbox.

import json
from elasticsearch import Elasticsearch

# Create client
es = Elasticsearch(["127.0.0.1:9200"])

# Verify that zentity is installed
url = "/_zentity?pretty"
response = es.transport.perform_request("GET", url)
print(json.dumps(response, indent=2, sort_keys=True))

# Submit a resolution request
url = "/_zentity/resolution/organization?pretty&_source=false"
response = es.transport.perform_request("POST", url, body={
  "attributes": {
    "id": [ "1497077796" ]
  }
})
print(json.dumps(response, indent=2, sort_keys=True))

This example was tested with Python 2.7, Elasticsearch 6.7.1, zentity 1.0.3, Windows 10.

Hi @davemoore,

Thank you for your answer, this is exactly what I was looking for.

One more question regarding zentity if you don't mind. As far as I understand zentity resolution works by giving it some instances of attributes to start with. For example, if we take a primitive case, we can find all the entities that are linked to a specific person based on name. However, what if we want to find all the linkages in our data? In other words, what if we want to resolve all the entities in our data based on a set of composite keys (i.e. resolver). Does zentity have this capability?

Regards,
Gio

@gioorso

You're right that zentity was designed to resolve a single entity per request in real-time. This contrasts with the more common approach of resolving a population of entities in batch. I made a brief comparison of the two approaches in this presentation (Slide 13).

At some level it would be possible to use zentity to resolve a population of entities. For example, you could scroll every document in an index, resolve the document with others using zentity, associate each document _id from the hits with an entity ID that you generate, and exclude each document _id from subsequent iterations of this batch process. But this approach has limited scalability. The list of excluded document identifiers will grow unbounded with each request, and omitting those exclusions will results in many redundant searches. There are more appropriate solutions for population scale entity resolution that operate in batch, but none are open source as far as I'm aware.

I view zentity as an appropriate solution in two cases:

  1. When your analysis is limited to a single entity or a small network of entities, and you want to simplify your architecture by skipping batch entity resolution; or
  2. When you have resolved a population of entities in batch and then want to resolve subsequent incoming entities in real-time.

@davemoore

Thank you very much Dave.

Best regards,
Gio

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.