I am having a hard time understanding how I can index files such PDF and .RTF files using the Ingest Attachment Processor Plugin and make them searchable.
My main problem seems to be that I can't search for the files after I did the following steps;
First i created a pipeline
client.ingest.putPipeline({
id: 'attachment',
body: {
description: 'Extract attachment information',
processors: [
{
attachment: {
field: 'data',
},
},
],
},
});
Then i inserted my Lorem Ipsum .rtf file
client.index({
index: 'books',
pipeline: 'attachment',
body: {
data: 'e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=',
},
});
Lastly i searched for it:
client.search({
index: 'books',
body: {
query: {
match: { content: "Lorem ipsum" },
},
},
});
The problem is that the search returns no matches!
I tried checking if the document was there by getting it by id and it does indeed find it. Even with the attachment data decoded from the bas64 data. See the JSON response below.
"attachment": {
"content_type": "application/rtf",
"language": "ro",
"content": "Lorem ipsum dolor sit amet",
"content_length": 28
}