Hello , is that possible to use the plugin "ingest-opennlp " in pdf ??
You can use the ingest attachment plugin first, and then run the opennlp processor against the field that was created by the attachment plugin.
--Alex
you mean like this ? :
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "data"
}
}
]
}
PUT _ingest/pipeline/opennlp-pipeline
{
"description": "A pipeline to do named entity extraction",
"processors": [
{
"opennlp" : {
"field" : "data"
}
}
]
}
PUT /indice12/type/1?pipeline=opennlp-pipeline
{
"data" :"base 64-pdf_conversion "
}
normally the OpenNlp need to filter the pdf before ingesting it ? Not ?
please tell me how ?
processors is an array. Instead of setting up 2 separate pipelines, set up a single pipeline with the attachment processor first and the opennlp processor next.
could you show me how please because i have lost a day in this problem.
also the two plugins are different how that could be
Something like
PUT _ingest/pipeline/opennlp-pipeline
{
"description": "A pipeline to do named entity extraction",
"processors": [
{
"attachment" : {
"field" : "<<base-64 encoded pdf field>>",
<<any additional ingest-attachment parameters>>
}
},
{
"opennlp" : {
"field" : "attachment.content",
<<any additional ingest-opennlp parameters>>
}
}
]
}
ok the request is valid without errors Thank you ! , but where would i find the results ,the index ?
Yes, you can PUT or POST a document with the ?pipeline=... (as you did). After this, you should bet able to do something like GET /indice12/_search and see a result.
You can also actually test your pipeline by using the simulate API without having to run a document into your index and then _search for or GET it
it require a body what should i put in the body i m new to es
The base-64 encoded PDF as a field. And then you'd reference that field name in the field component of the attachment processor
no you didn't understand me
i talk about this:
PUT /indice12/type/1?pipeline=opennlp-pipeline{
here the problem, what should i put here it dosen't work
}
You should put something like "body": "aGVsbG8gdGhlcmU=" or whatever your base-64 encoded pdf is. Assuming you use body here, the field component I have under the attachment section of PUT _ingest/pipeline/opennlp-pipeline would be body
i did as you told me :
PUT /indice12/type/1?pipeline=opennlp-pipeline{
"body":"pdf-conversion-to-base64"
}
And it returned error
java.lang.IllegalArgumentException
The opennlp-pipeline attachment.field should be a field name that you're going to pass in, not the content of the PDF. You've also gotten the order of the opennlp/attachment processors in reverse.
ok for the order
but above in your response
attachment" : {
"field" : "<<base-64 encoded pdf field>>",
<<any additional ingest-attachment parameters>>
}
I m confused??
So you'd have something like
PUT _ingest/pipeline/opennlp-pipeline
{
"description": "A pipeline to do named entity extraction",
"processors": [
{
"attachment" : {
"field" : "mycontentfield"
}
},
{
"opennlp" : {
"field" : "attachment.content"
}
}
]
}
and then
PUT /indice12/type/1?pipeline=opennlp-pipeline
{
"mycontentfield": "aGVsbG8gdGhlcmU="
}
Thank you so much
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
