Is there any spaCy libraries for Elasticsearch available?

Hi guys, wanted to check if there is any spaCy plugins or libraries for Elasticsearch available out there?

I found the openNLP Ingest processor but my professor wants me to find more on using spaCy...

Much appreciated! (:

There is no spacy support. There is however another language detection built into elasticsearch that has just been added in the recent 7.6.0 release.

See this sample

POST _ingest/pipeline/_simulate?filter_path=**.predicted_value
{
  "pipeline": {
    "processors": [
      {
        "inference": {
          "model_id": "lang_ident_model_1",
          "inference_config": { "classification": {}},
          "field_mappings": {}
        }
      }
    ]
  },
  "docs": [
    { "_source": { "text": "This is an english text" } },
    { "_source": { "text": "Das ist ein deutscher Text" } },
    { "_source": { "text": "你好世界" } },
    { "_source": { "text": "Ceci est un texte en français" } }
  ]
}

Here's a Python script I use to download RSS headlines and perform entity extraction on the headlines using Spacy. It uses the annotated_text format used by the elasticsearch plugin designed to search and highlight entities embedded in text.

1 Like

@Mark_Harwood
i will try to look through to see if i can use any subsets of the codes to make do with... May I ask questions about your code when in doubt? :ok_hand:t2:

What is the difference between type: annotated_text and type: text as in seen in

mapping = { "properties": { "id": {"type": "keyword"}, "url": {"type": "keyword"}, "headline": {"type": "annotated_text", "analyzer":"analyzer_shingle"}, "published": {"type": "date"}, "feedLink": {"type": "keyword"}, "tags":{"type":"keyword"} }

Did you read the blog post I linked to?

AH HA... GOT IT!

By any chance... Are you going to integrate it into a plugin to be used in Elasticsearch? like the opennlp-ingest-processor?

No, Spacy is Python and elasticsearch core is Java-based so that's not an obvious integration for us to consider.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.