Thank you spinscale for the Elasticsearch OpenNLP Ingest Processor. Awesome!
I installed elasticsearch, installed the plugin, tested the plugin, and it works.
--test
--build
$ curl -XPUT -i "localhost:9200/my_index/" -d "@./mappings.json"
$ cat ./mappings.json" { "mappings": { "my_type": { "properties": { "my_field": { "type": "string" } } } } }
--create pipeline
$ curl -XPUT localhost:9200/_ingest/pipeline/opennlp-pipeline -d ' { "description": "A pipeline to do named entity extraction", "processors": [ { "opennlp" : { "field" : "my_field" } } ] } '
--add data
$ curl -XPUT 'localhost:9200/my-index/my-type/1?pipeline=opennlp-pipeline' -d ' { "my_field" : "Kobe Bryant was one of the best basketball players of all times. Not even Michael Jordan has ever scored 81 points in one game. Munich is really an awesome city, but New York is as well. Yesterday has been the hottest day of the year." } '
--query data
$ curl -XGET 'localhost:9200/my-index/my-type/1'
{"_index":"my-index","_type":"my-type","_id":"1","_version":1,"found":true,"_source":{"my_field":"Kobe Bryant was one of the best basketball players of all times. Not even Michael Jordan has ever scored 81 points in one game. Munich is really an awesome city, but New York is as well. Yesterday has been the hottest day of the year.","entities":{"persons":["Kobe Bryant","Michael Jordan"],"dates":["Yesterday"],"locations":["Munich","New York"]}}}
However, it fails when i attempt to use it here - python and simulate. Can someone point me in the proper direction? Apologies, if I am missing the obvious.
$ curl -X POST localhost:9200/_ingest/pipeline/_simulate?verbose -d '
{
"pipeline" :
{
"description": "A pipeline to do named entity extraction",
"processors": [
{ "opennlp" :
{ "field": "raw_text" }
}
]
},
"docs":
[
{
"_index": "crawl",
"_type": "tor",
"_source": {"raw_text":"the quick brown fox jumped over" }
}
]
}
'
{"docs":[{"processor_results":[{"error":{"root_cause":[{"type":"exception","reason":"Could not find field [dates], possible values []"}],"type":"exception","reason":"Could not find field [dates], possible values []"}}]}]}
UPDATE: somehow between the test and prod, i mangled the ingest pipeline and or plugin. i recompiled the plugin, reinstalled, reran and all was well.
curl -X POST localhost:9200/_ingest/pipeline/_simulate?verbose -d '
{
"pipeline" :
{
"description": "A pipeline to do named entity extraction",
"processors": [
{ "opennlp" :
{ "field": "raw_text" }
}
]
},
"docs":
[
{
"_index": "crawl",
"_type": "tor",
"_source": {"raw_text":"the quick brown fox jumped over" }
}
]
}
'
{"docs":[{"processor_results":[{"doc":{"_index":"crawl","_type":"tor","_id":"_id","_source":{"raw_text":"the quick brown fox jumped over","entities":{}},"_ingest":{"timestamp":"2017-08-15T05:01:16.857Z"}}}]}]}