Hello , is that possible to use the plugin "ingest-opennlp " in pdf ??
You can use the ingest attachment plugin first, and then run the opennlp processor against the field that was created by the attachment plugin.
--Alex
you mean like this ? :
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "data"
}
}
]
}
PUT _ingest/pipeline/opennlp-pipeline
{
"description": "A pipeline to do named entity extraction",
"processors": [
{
"opennlp" : {
"field" : "data"
}
}
]
}
PUT /indice12/type/1?pipeline=opennlp-pipeline
{
"data" :"base 64-pdf_conversion "
}
normally the OpenNlp need to filter the pdf before ingesting it ? Not ?
please tell me how ?
processors
is an array. Instead of setting up 2 separate pipeline
s, set up a single pipeline
with the attachment
processor first and the opennlp
processor next.
could you show me how please because i have lost a day in this problem.
also the two plugins are different how that could be
Something like
PUT _ingest/pipeline/opennlp-pipeline
{
"description": "A pipeline to do named entity extraction",
"processors": [
{
"attachment" : {
"field" : "<<base-64 encoded pdf field>>",
<<any additional ingest-attachment parameters>>
}
},
{
"opennlp" : {
"field" : "attachment.content",
<<any additional ingest-opennlp parameters>>
}
}
]
}
ok the request is valid without errors Thank you ! , but where would i find the results ,the index ?
Yes, you can PUT
or POST
a document with the ?pipeline=...
(as you did). After this, you should bet able to do something like GET /indice12/_search
and see a result.
You can also actually test your pipeline by using the simulate API without having to run a document into your index and then _search
for or GET
it
it require a body what should i put in the body i m new to es
The base-64 encoded PDF as a field. And then you'd reference that field name in the field
component of the attachment
processor
no you didn't understand me
i talk about this:
PUT /indice12/type/1?pipeline=opennlp-pipeline{
here the problem, what should i put here it dosen't work
}
You should put something like "body": "aGVsbG8gdGhlcmU="
or whatever your base-64 encoded pdf is. Assuming you use body
here, the field
component I have under the attachment
section of PUT _ingest/pipeline/opennlp-pipeline
would be body
i did as you told me :
PUT /indice12/type/1?pipeline=opennlp-pipeline{
"body":"pdf-conversion-to-base64"
}
And it returned error
java.lang.IllegalArgumentException
The opennlp-pipeline attachment.field
should be a field name that you're going to pass in, not the content of the PDF. You've also gotten the order of the opennlp/attachment processors in reverse.
ok for the order
but above in your response
attachment" : {
"field" : "<<base-64 encoded pdf field>>",
<<any additional ingest-attachment parameters>>
}
I m confused??
So you'd have something like
PUT _ingest/pipeline/opennlp-pipeline
{
"description": "A pipeline to do named entity extraction",
"processors": [
{
"attachment" : {
"field" : "mycontentfield"
}
},
{
"opennlp" : {
"field" : "attachment.content"
}
}
]
}
and then
PUT /indice12/type/1?pipeline=opennlp-pipeline
{
"mycontentfield": "aGVsbG8gdGhlcmU="
}
Thank you so much
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.