Solr has this Tagger Handler feature, is there any ES out of the box equivalent for that?
How does the tagger works ?
The tagger handler relies on a dedicated collection in which it stores the entities to be extracted. In this collection, one field is used to store the texts used to recognize each entity, and you may create as many other fields as you want to store other useful information about your entities.
Assume we want to recognize city names in our documents. We can have fields storing the timezone, the localization (longitude and latitude) as well as the country of the city. The “tag” field could contain the different names that are used to designate the city, such as “New york City” and “NYC” for example.
Once the collection is created and populated, and the handler properly configured, you can use the handler, passing it text and receiving the list of entities found into the provided text. The matching is done only using the text provided into the “tag” field, but you can ask the tagger to return all the fields you want from the entity using the standard fl parameter.
Note: I'm aware of the ES Mapper Annotated Plugin, but it is mean for already annotated/tagged texts.
Thanks in advance
maybe the ingest opennlp plugin can be of help here. It uses Apache OpenNLP for NER and you can use custom models if you want.
Hope this helps!
Thanks for your reply, this is a pretty interesting plugin. But the idea is to not use a NER Model... instead I have a CSV file holding the Entity Ids, names and synonyms. The Solr Tagger Handler can read a CSV file like that, for example.
Also the intention is not to index the text, but just get their tags returned. So I was wondering if there's any ES out of the box for that, but I'm afraid there isn't.
I hope it makes sense, thanks once agian.
the intention is not to index the text, but just get their tags returned
Sounds like the percolate API may be relevant?
AFAIK the Solr text tagger is a kind of gazetteer (dictionary based) and uses a second index to store the named entities and associated meta data. Thats as much as I remember from using it about 5 years ago with Solr. The percolate API might get you some way there, at least for a reasonably low number of entities (probably in the low 10s of thousands, but maybe not in the millions but I might be mistaken there).
That said, I did some experiments with making the text tagger code work as an ES plugin a while ago but it alway got stalled for some reason (mostly lack of time). It would be nice to get a better feeling about your usecase, whether you can or why you cannot use Percolator to see if this would be a good addition to the ES plugin ecosystem.
Sorry for the late reply. Percolate seems indeed to be the closest ES tool for what we are aiming. The project was put on hold for now, though. We intend to take a better look at it in another moment (if it happens, I'll leave the feedback here). Thanks!
Very interesting, Christoph.
In a nutshell, the intention is: to store data from a CSV containing information like: id, name and synonyms (the most important attribute), which would be the tags. Then, via rest, we'd send text to it and get the related tags as response (it could be synonyms, the name or any other stored column).
I haven't checked deeper the Percolate yet to check how well it would serve us. If we get there, I'll leave a more accurate feedback here.
Thanks a lot
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.