Best approach to index words and annotate with their type (entity, verb) and then search returns these words w/ assigned annotations?


(Alfred Reinold Baudisch) #1

I'm trying to build a very simple NLP chat (I could even say pseudo-NLP?), where I want to identify a fixed subset of intentions (verbs, sentiments) and entities (products, etc). It's a kind of entity identification or named-entity recognition, but I'm not sure I need a full fledged NER solution for what I want to achieve.

To make things simple, the user has to type the exact words, and the system won't deal with typos, etc. So if one types "carr", nothing will be found, only "car" is valid. This is more about discovering a search strategy than NLP.

I want to index something like:

want [type: intent]
buy [type: intent]
computer [type: entity]
car [type: entity]

Then the user will type:

I want to buy a car.

Then I send this phrase to ElasticSearch and it should return me something like below (it doesn't have to be structured like that, but each word should come with its type):

[
    ["word":"want", "type:"intent"],
    ["word":"buy", "type":"intent"],
    ["word":"car","type":"car"]
]

The approach I came with was Indexing each word as:

{
    "word": "car",
    "type": "entity"
}
{
    "word": "buy",
    "type": "intent"
}

And then I provide the whole phrase, searching by "word". But I had no success so far, even using Multi-Word queries, some words aren't matched.

Any insights/ideas/tips to keep this using only Elasticsearch?

If I do need to use a dedicated NER solution, what is the best/simplest one to use with Elasticsearch? Curiously I didn't find much about this on google.


(system) #2