Hi guys, wanted to check if there is any spaCy plugins or libraries for Elasticsearch available out there?
I found the openNLP Ingest processor but my professor wants me to find more on using spaCy...
Much appreciated! (:
Hi guys, wanted to check if there is any spaCy plugins or libraries for Elasticsearch available out there?
I found the openNLP Ingest processor but my professor wants me to find more on using spaCy...
Much appreciated! (:
There is no spacy support. There is however another language detection built into elasticsearch that has just been added in the recent 7.6.0 release.
See this sample
POST _ingest/pipeline/_simulate?filter_path=**.predicted_value
{
"pipeline": {
"processors": [
{
"inference": {
"model_id": "lang_ident_model_1",
"inference_config": { "classification": {}},
"field_mappings": {}
}
}
]
},
"docs": [
{ "_source": { "text": "This is an english text" } },
{ "_source": { "text": "Das ist ein deutscher Text" } },
{ "_source": { "text": "你好世界" } },
{ "_source": { "text": "Ceci est un texte en français" } }
]
}
Here's a Python script I use to download RSS headlines and perform entity extraction on the headlines using Spacy. It uses the annotated_text format used by the elasticsearch plugin designed to search and highlight entities embedded in text.
@Mark_Harwood
i will try to look through to see if i can use any subsets of the codes to make do with... May I ask questions about your code when in doubt?
What is the difference between type: annotated_text and type: text as in seen in
mapping = { "properties": { "id": {"type": "keyword"}, "url": {"type": "keyword"}, "headline": {"type": "annotated_text", "analyzer":"analyzer_shingle"}, "published": {"type": "date"}, "feedLink": {"type": "keyword"}, "tags":{"type":"keyword"} }
Did you read the blog post I linked to?
AH HA... GOT IT!
By any chance... Are you going to integrate it into a plugin to be used in Elasticsearch? like the opennlp-ingest-processor?
No, Spacy is Python and elasticsearch core is Java-based so that's not an obvious integration for us to consider.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.