Hi @dadoonet,
Here is goes, hope it is as expected. If no, please let me know, how I can improve it.
One details when creating my pipeline I had to use "indexed_chars" : -1
in order to accomplish indexing my pdf content.
After installing the ingest-attachment, I create an Index and a pipeline as my last post, and index my first item as of below:
// Index 'First Book'
{
"field1" : "First Book"
}
Then I index my pdf file content, as of below, using on my POSTMAN header Content-Type=application/pdf:
// PUT /test/type1/1?pipeline=attachment
{
"data" : "MY_BASE_64_ENCODED_PDF_FILE"
}
I have used PHP encoding and ASP encoding
Which resulted as of:
{
"_index": "test",
"_type": "type1",
"_id": "1",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}
So, I fetch my index, which shows:
{
"_index": "test",
"_type": "type1",
"_id": "1",
"_version": 2,
"found": true,
"_source": {
"data": "MY_BASE_64_ENCODED_PDF_FILE"
}
}
So, it does not show as the documentation, where shoudl have something like this at the bottom:
"attachment": {
"content_type": "application/rtf",
"language": "ro",
"content": "Lorem ipsum dolor sit amet",
"content_length": 28
}
I am probably missing something... just not sure what neither where... Also, I tried to index a simple base64encoded text, it brings the "attachment" field, but empty, as of below:
{
"_index": "test",
"_type": "type1",
"_id": "1",
"_version": 2,
"found": true,
"_source": {
"data": "dGVzdGluZyBteSBmaXJzdCBlbmNvZGVkIHRleHQ=",
"attachment": {}
}
}
Thanks again for your help and your time.
Edit 1
I am using elasticsearch image on docker.