Could not start some language models for NLP tasks

Ilia · February 22, 2022, 12:11pm

I've installed Eland and have imported a model from huggingface as follows:

eland_import_hub_model --url  http://localhost:9200/ \ 
--hub-model-id HooshvareLab/bert-fa-zwnj-base-ner \ 
--task-type ner

I can see it in http://localhost:5601/app/ml/trained_models but when I try to start it either by using start icon or by this api POST _ml/trained_models/hooshvarelab__bert-fa-zwnj-base-ner/deployment/_start I get this error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "status_exception",
        "reason" : "Could not start trained model deployment, the following nodes failed with errors [{f5IiQjhhTs6-ewaKpQ8xFg=Validation Failed: 1: classification label [B_DAT] is not an entity I-O-B tag.;2: classification label [B_EVE] is not an entity I-O-B tag.;3: classification label [B_FAC] is not an entity I-O-B tag.;4: classification label [B_MON] is not an entity I-O-B tag.;5: classification label [B_PCT] is not an entity I-O-B tag.;6: classification label [B_PRO] is not an entity I-O-B tag.;7: classification label [B_TIM] is not an entity I-O-B tag.;8: classification label [I_DAT] is not an entity I-O-B tag.;9: classification label [I_EVE] is not an entity I-O-B tag.;10: classification label [I_FAC] is not an entity I-O-B tag.;11: classification label [I_MON] is not an entity I-O-B tag.;12: classification label [I_PCT] is not an entity I-O-B tag.;13: classification label [I_PRO] is not an entity I-O-B tag.;14: classification label [I_TIM] is not an entity I-O-B tag.;15: Valid entity I-O-B tags are [O, B_MISC, I_MISC, B_PER, I_PER, B_ORG, I_ORG, B_LOC, I_LOC];}]"
      }
    ],
    "type" : "status_exception",
    "reason" : "Could not start trained model deployment, the following nodes failed with errors [{f5IiQjhhTs6-ewaKpQ8xFg=Validation Failed: 1: classification label [B_DAT] is not an entity I-O-B tag.;2: classification label [B_EVE] is not an entity I-O-B tag.;3: classification label [B_FAC] is not an entity I-O-B tag.;4: classification label [B_MON] is not an entity I-O-B tag.;5: classification label [B_PCT] is not an entity I-O-B tag.;6: classification label [B_PRO] is not an entity I-O-B tag.;7: classification label [B_TIM] is not an entity I-O-B tag.;8: classification label [I_DAT] is not an entity I-O-B tag.;9: classification label [I_EVE] is not an entity I-O-B tag.;10: classification label [I_FAC] is not an entity I-O-B tag.;11: classification label [I_MON] is not an entity I-O-B tag.;12: classification label [I_PCT] is not an entity I-O-B tag.;13: classification label [I_PRO] is not an entity I-O-B tag.;14: classification label [I_TIM] is not an entity I-O-B tag.;15: Valid entity I-O-B tags are [O, B_MISC, I_MISC, B_PER, I_PER, B_ORG, I_ORG, B_LOC, I_LOC];}]"
  },
  "status" : 500
}

Why I get this error?

p.s.

Among almost 10 models (from huggingface hub) that I've tried to deploy/import and start, I could just get 3 of them working (including two elastic's models and dslim/bert-base-NER-uncased)! Others failed in importing time by eland or in start time in elastic/kibana!
I've noticed the models must be BERT or other generations of it
My Elasticsearch and Kibana version is 8.0.0
and have activated trial license

dkyle · February 22, 2022, 2:19pm

Hi Ilia

Yes in the 8.0.0 release only BERT models are supported, other models including MPNet are coming in future releases. For a list of supported architectures refer to Third party NLP models | Machine Learning in the Elastic Stack [8.0] | Elastic

That page also recommends 3 models you can use for the Named Entity Recognition, I'm guessing those are the ones you had success with.

The error you see is because the I-O-B tagging schema is not recognised. The schema is expected to consist of the tags B_MIS, I-MIS, ... as used by this BERT model dslim/bert-base-NER · Hugging Face

The tags used by your model such as DAT and EVE are not recognised.

There is an open issue in the Elasticsearch repo to support different tagging schemas for NER, please comment on or +1 the issue so that is may be prioritised

github.com/elastic/elasticsearch

[ML] Support different entity tagging schemas in NER

opened 09:15AM - 25 Aug 21 UTC

davidkyle

>enhancement :ml Team:ML

The Named Entity Recognition task works for [Inside-Outside-Beginning Tags](http…s://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)) but different models can be trained with a different tagging schemas or use extra tags. For example [KB/bert-base-swedish-cased-ner](https://huggingface.co/KB/bert-base-swedish-cased-ner/blob/main/config.json) has a different set of labels which makes it incompatible with the current NER task where the expectation that IOB tags are used is hard coded. It may be possible to add a configuration option for the different labelling schemas.

Have you had any success using models for different tasks such as text classification?

Ilia · February 22, 2022, 4:23pm

Hi David,
Thank you for the response,

I've seen the Third party NLP models page but I guessed it is sufficient for a model to be based on BERT or based on the few other architectures mentioned on that page. If I understand you correctly, there are a list of fixed tags that any compatible model must only use these tags, right?

That page also recommends 3 models you can use for the Named Entity Recognition, I'm guessing those are the ones you had success with.

Yes, but I reached to these models almost randomly! Moreover, I get Segmentation fault (core dumped) error when tried to import dslim/bert-base-NER with Eland but could to import and start dslim/bert-base-NER-uncased successfully. Unfortunately, none of these 3 models help me and what I need is to deploy a NER model for Persian language (called also Farsi, fa), but it seems that there is no compatible model in huggingface

There is an open issue in the Elasticsearch repo to support different tagging schemas for NER, please comment on or +1 the issue so that is may be prioritised

+1 ed.

Have you had any success using models for different tasks such as text classification?

No, at the moment the NER is more important for us. We might to use text classification later.

Oh, does these all means that just those 3 ner models are usable in Elasticsearch ner tasks?!!

dkyle · February 23, 2022, 11:09am

Can you tell me more about the core dump please. What OS are you using and CPU are you using? Did it occur when you started the deployment?

there are a list of fixed tags that any compatible model must only use these tags, right?

Yes that is correct.

+1 ed.

Thank you

Ilia · February 23, 2022, 2:56pm

I get that error when I've tried to import model by eland on WSL. To reproduce the error to post here, I've executed the below command again but this time the model imported successfully!
eland_import_hub_model --url http://localhost:9200/ --hub-model-id dslim/bert-base-NER --task-type ner

We are trying to change/map tags of the hooshvarelab/bert-fa-zwnj-base-ner model to those are acceptable by elastic. Now, I need to know all acceptable tags by elastic for ner tasks, where I can find these tags?

dkyle · February 24, 2022, 9:37am

From dslim/bert-base-NER · Hugging Face these are the tags used.

Abbreviation	Description
O	Outside of a named entity
B-MIS	Beginning of a miscellaneous entity right after another miscellaneous entity
I-MIS	Miscellaneous entity
B-PER	Beginning of a person’s name right after another person’s name
I-PER	Person’s name
B-ORG	Beginning of an organization right after another organization
I-ORG	organization
B-LOC	Beginning of a location right after another location
I-LOC	Location

I'm afraid I can't see a way to map those to the tags hooshvarelab/bert-fa-zwnj-base-ner uses as there is a different number of them

Ilia · February 26, 2022, 1:02pm

I guess we can map PER, ORG and LOC. Other tags will be mapped to MIS! Some thoughts:

we will miss a lot of useful tags with mapping them to MIS but have no other choice
we gave it a try but yet didn't completed the work
if supporting other tagging schemas is possible by just adding them to an existing list in Elasticsearch's source code, it might be possible to do so and build source code. Searching in the source code I found only this and this one, but they are docs!
this work is kinda poc (prove of concept) and we might need to train our model

system · March 26, 2022, 1:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

dkyle · May 30, 2022, 3:52pm

@Ilia

A PR has just been merged in Elasticsearch that makes the entity tags configurable and will fix the problem you are having using the hooshvarelab/bert-fa-zwnj-base-ner model.

It will be in the 8.4 release so keep an eye out for that.

github.com/elastic/elasticsearch

[ML] expand allowed NER labels to be any I-O-B tagged labels

elastic:master ← benwtrent:feature/ml-expand-ner-tokens-allowed

opened 08:35PM - 24 May 22 UTC

benwtrent

+230 -198

Named entity recognition (NER) is a special form of token classification. The sp…ecific kind of labelling we support is Inside-Outside-Beginning (IOB) tagging. These labels indicate if they are the inside of a token (with a `I-` or `I_`), the beginning (`B-` or `B_`) or outside (`O`). Each valid token classification label starts with the require prefix or `O`. Before this commit, we restricted the labels to a specific set: ``` O(Entity.NONE), // Outside a named entity B_MISC(Entity.MISC), // Beginning of a miscellaneous entity right after another miscellaneous entity I_MISC(Entity.MISC), // Miscellaneous entity B_PER(Entity.PER), // Beginning of a person's name right after another person's name I_PER(Entity.PER), // Person's name B_ORG(Entity.ORG), // Beginning of an organization right after another organization I_ORG(Entity.ORG), // Organisation B_LOC(Entity.LOC), // Beginning of a location right after another location I_LOC(Entity.LOC); // Location ``` But now, any entity is allowed, as long as the naming of the labels adhere to IOB tagging rules. Here is an inference response containing other token labels: ``` { "predicted_value": "[Birth defects](ADR&Birth+defects) associated with [thalidomide](DRUG&thalidomide).", "entities": [ { "entity": "birth defects", "class_name": "ADR", "class_probability": 0.9664951378636988, "start_pos": 0, "end_pos": 13 }, { "entity": "thalidomide", "class_name": "DRUG", "class_probability": 0.7323781805751934, "start_pos": 30, "end_pos": 41 } ] } ```