Indexing articles with attachments

We are using a knowledge management system that allows authors to write articles and attach
documents. The article contains the following fields.

title
description
content
locale
categories
issue date
attachment_01
attachment_02

We are able to index the first six article properties without any issues with Elasticsearch. We know that we will need to use the ingest-attachment plugin for the last two properties. The problem we face is that the data will be in two different indexes. How do we preserve search results so that all the data as defined from above come back as one search hit?

Why the data is in 2 different indices?
If you want to get that back as one hit, it needs to be in the same document.

1 Like

Here is a sample of my document.

Listing A

{
    "title": "Example document with multiple attachments",
    "description" : "Some description here...",
    "content": "Main document content",
    "locale": "en_US".
    "categories":   [
           {
              "refkey" : "NOSQL",
              "name" : "NoSQL",
              "objectid" : "016.001.001",
              "guid" : "091ed45fc58045a2b03df1b6d7763ec5"
           },
           {
              "refkey" : "ELASTICSEARCH_NOSQL",
              "name" : "Elasticsearch (NoSQL)",
              "objectid" : "016.001.001.001",
              "guid" : "3a8e36597efe4439bebf7bb0c9297b98"
           }
    ],
    "attachments" : [
           {
                "title" : "Word example",
                "data" : "VGhpcyBpcyBhIHRlc3Qgd29yZCBkb2N1bWVudAo=",
                "content" : "test.docx",
                "size" : 4094,
                "description" : "Word example description goes here."
            },
            {
                "title" : "PDF example",
                "data" : "CkZpbGU6IC9ob21lL2p0YW5nL3RtcC90ZXN0MDEudHh0IFBhZ2UgMSBvZiAxCgogCkhlbGxvIHdvcmxkCgoK",
                "content" : "test01.pdf",
                "size" : 11595,
                "description" : "PDF example description goes here."
           }
     ]
}

I created a pipeline with the following
Listing B

PUT /_ingest/pipeline/attachment
{
  "description" : "Extract attachment information from arrays",
  "processors" : [
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "attachment": {
            "target_field": "_ingest._value.attachment",
            "field": "_ingest._value.data"
          }
        }
      }
    }
  ]
}

Do I send my whole document to Elasticsearch like the following:

PUT howto_en/_doc/howto1005?pipeline=attachment
{
  //Listing A
}

Thanks for the detailed script. It helps.

I did not test it but it looks good to me.

Thank you for the assistance. I got it to work.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.