Hello, i've succeded, recently thanks to this site to link opennlp ,to my pdf.
However the result wasn't as good as i expected.
This the pdf :
and this what i get :
what is wrong with plugin
is there something i need to fix before using it ??
Hello, i've succeded, recently thanks to this site to link opennlp ,to my pdf.
However the result wasn't as good as i expected.
This the pdf :
and this what i get :
what is wrong with plugin
is there something i need to fix before using it ??
Hard to read images. Would be better to paste text only specifically when it's not a UI problem.
Anyway, what looks wrong here?
Not sure I understand the "problem". Looks like your document has entities for persons, locations, dates.
So what is wrong? What do you expect?
i 've many name in my pdf, althought openNlp didn't return them
also i' ve many adresses ,it didn't return it all
what i have get right are the dates .
need it a configuration ??
Can you reproduce it with just nlp plugin and provide just a text as the input ?
If you can may be worth opening an issue in the nlp project ?
PUT _ingest/pipeline/opennlp-pipeline
{
"description": "A pipeline to do named entity extraction",
"processors": [
{
"attachment" : {
"field" : "mycontentfield"
}
},
{
"opennlp" : {
"field" : "attachment.content"
}
}
]
}
PUT /indice14/type/1?pipeline=opennlp-pipeline
{
"mycontentfield":"base64of the pdf"
}
it was the response i get here for a previous question.
where can i post an issue in Nlp project here or where ?
i can't find a categorie named nlp project here !!!
I said: without the attachment plugin. Could you reproduce and share your script?
No with just nlp plugin i can't .
Actually i have asked here and the owner of the plugin "Alexander Reelsen" responded that i need to use attachment plugin first.
No. He told you that if you want to extract text from a PDF document you need to use ingest-attachment.
If you want to send the text to NLP then you need to use the NLP plugin.
What I'm asking for is to use only NLP and send to it just some text.
And reproduce the issue you are seeing.
What you can do is also to use the _simulate
ingest endpoint with the verbose option to see what is happening at each step. https://www.elastic.co/guide/en/elasticsearch/reference/6.2/simulate-pipeline-api.html
And share it here and also a full reproduction script if you need help.
yes , today i've put as you told me just a text in openNlp plugin and the result was the same !
this what i wrote in dev tools :
PUT /index20/type/1?pipeline=opennlp-pipeline
{
"my_field":"Ahmed HADDAD Développement et\nAvenue de l’UMA\n\n\nConception en\n2035
Charguia 2\n\njava/JEE\n\nTél :25932722\n\nhaddadahmed1994@gmail.com\n23
ans\nFORMATION\n2017-2018 3éme année Génie Informatique\n2014-2015Diplôme Ingénieur
premier cycle\n2011-2012 Diplôme BAC Science\nLangues\n : Anglais-Français et de
L’italien\n\nEXPERIENCES\n\nAôut 2017 : Stage en Advance Web djerba—réalisation d’un tchat en
AJAX et JQuery\n\n\nJuilliet 2016 : Stage d’Intiation en Advance Web djerba-un Blog communautaire
php et Mysql\n\n\nLOISIRS\nJ’aime la Consultation des News Lettre des Communauté informatique
dans le web,\nJouer les Jeux videos Steam online , un peu d’echec."
}
Using simulate
is easier.
Anyway, what is the output of
GET index20/type/1
?
this is the response
{
"_index": "index20",
"_type": "type",
"_id": "1",
"_version": 1,
"found": true,
"_source": {
"my_field": """
Ahmed HADDAD Développement et
Avenue de l’UMA
Conception en
2035 Charguia 2
java/JEE
Tél :25932722
haddadahmed1994@gmail.com
23 ans
FORMATION
2017-2018 3éme année Génie Informatique
2014-2015Diplôme Ingénieur premier cycle
2011-2012 Diplôme BAC Science
Langues
: Anglais-Français et de L’italien
EXPERIENCES
Aôut 2017 : Stage en Advance Web djerba—réalisation d’un tchat en AJAX et JQuery
Juilliet 2016 : Stage d’Intiation en Advance Web djerba-un Blog communautaire php et Mysql
LOISIRS
J’aime la Consultation des News Lettre des Communauté informatique dans le web,
Jouer les Jeux videos Steam online , un peu d’echec.
""",
"entities": {
"persons": [
"Mysql LOISIRS J",
"Avenue"
],
"dates": [
"2011 - 2012",
"1994",
"2016",
"2014 - 2015",
"2035"
],
"locations": [
"Avenue de l"
]
}
}
}
What do you expect?
I expect :
In person : Ahmed HADDAD
In dates: the same but without "-"
in locations: Avenue de l’UMA,2035 Charguia 2,djerba
also
I'm not familiar with this plugin but I wonder if it's a problem about the language used.
Here your text is in french. May be you are using the default models which are in english (if I understand correctly this part of the doc):
ingest.opennlp.model.file.persons: en-ner-persons.bin
ingest.opennlp.model.file.dates: en-ner-dates.bin
ingest.opennlp.model.file.locations: en-ner-locations.bin
May be @spinscale has more ideas?
Yes, it's French and an Arab name written in french.
The code that you ve post is what i've added to be able to use the plugin.
Did you mean that only works with english name and date and locations ?
I don't know. That's just a guess and when @spinscale will be available, he will be able to answer.
This depends on the model. The default model only works with english, but maybe there are other working with your languages of choice. You need to check the apache opennlp for that.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.