I am using ingets-attachment plugin of elasticsearch to parse documents, with the intention of doing a search for a word and retrieve the documents that contain that word.
I created a pipeline:
PUT: http://localhost:9200/_ingest/pipeline/pipeline-for-many-attachements
{
"description" : "Extract attachment information from arrays",
"processors" : [
{
"foreach": {
"field": "attachments",
"processor": {
"attachment": {
"target_field": "_ingest._value.attachment",
"field": "_ingest._value.data"
}
}
}
}
]
}
Added attachment to an index and process them using the pipeline specified above:
PUT: http://localhost:9200/index-for-many-attachments/doc/0?pipeline=pipeline-for-many-attachements
{
"attachments" : [
{
"filename" : "ipsum.txt",
"data" : "QWxpbmEgaGFkIGx1bmNoLg=="
},
{
"filename" : "test.txt",
"data" : "Sm9zaCB3YXMgb24gdmFjYXRpb24u"
}
]
}
Result:
{
"_index": "index-for-many-attachments",
"_type": "doc",
"_id": "0",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}
The content of "data" is encoded text in base64.
Get the documents:
GET: http://localhost:9200/index-for-many-attachments/doc/0
Result:
{
"_index": "index-for-many-attachments",
"_type": "doc",
"_id": "0",
"_version": 2,
"found": true,
"_source": {
"attachments": [
{
"filename": "ipsum.txt",
"data": "QWxpbmEgaGFkIGx1bmNoLg==",
"attachment": {
"content_type": "text/plain; charset=ISO-8859-1",
"language": "sk",
"content": "Alina had lunch.",
"content_length": 17
}
},
{
"filename": "test.txt",
"data": "Sm9zaCB3YXMgb24gdmFjYXRpb24u",
"attachment": {
"content_type": "text/plain; charset=ISO-8859-1",
"language": "en",
"content": "Josh was on vacation.",
"content_length": 22
}
}
]
}
}
My intention now is to pass real documents from my local machine to the ingest-attachment, and be able to search for words in those local documents.
My question: How do I tell in my call to look for documents stored locally?
For example doc1.pdf and doc2.pdf stored at location path1/doc1.pdf and path2/doc2.pdf.