Defining Elasticsearch mapping for ingest-attachment inner field


(Thierry Florac) #1

I'm trying to build an application using an Elasticsearch index. I have several "inner" fields which can contain binary data (mainly PDF), and I'm looking for the best way to define my pipeline and mapping, given the facts that:

  • all fields and contents can be provided in several languages (french and english) and in several fields
  • I have to be able to query contents for a given language and/or for a given field.

This is how I defined my mapping until now:

{
    "WfNewsEvent": {
        "properties": {
            "title": {
                "type": "object",
                "properties": {
                    "en": {
                        "type": "string"
                    },
                    "fr": {
                        "type": "string",
                        "analyzer": "french",
                        "search_analyzer": "french_search"
                    }
                }
            },
            ...
            "extfile": {
                "type": "object",
                "properties": {
                    "title": {
                        "type": "object",
                        "properties": {
                            "en": {
                                "type": "string"
                            },
                            "fr": {
                                "type": "string",
                                "analyzer": "french",
                                "search_analyzer": "french_search"
                            }
                        }
                    },
                    "description": {
                        "type": "object",
                        "properties": {
                            "en": {
                                "type": "string"
                            },
                            "fr": {
                                "type": "string",
                                "analyzer": "french",
                                "search_analyzer": "french_search"
                            }
                        }
                    },
                    "data": {
                        "type": "object",
                        "properties": {
                            "en": {
                                "type": "attachment"
                            },
                            "fr": {
                                "type": "attachment",
                                "analyzer": "french",
                                "search_analyzer": "french_search"
                            }
                        }
                    }
                }
            },
            "gallery": {
                "type": "object",
                "properties": {
                    "title": {
                        "type": "object",
                        "properties": {
                            "en": {
                                "type": "string"
                            },
                            "fr": {
                                "type": "string",
                                "analyzer": "french",
                                "search_analyzer": "french_search"
                            }
                        }
                    },
                    "description": {
                        "type": "object",
                        "properties": {
                            "en": {
                                "type": "string"
                            },
                            "fr": {
                                "type": "string",
                                "analyzer": "french",
                                "search_analyzer": "french_search"
                            }
                        }
                    },
                    "data": {
                        "type": "object",
                        "properties": {
                            "en": {
                                "type": "attachment"
                            },
                            "fr": {
                                "type": "attachment",
                                "analyzer": "french",
                                "search_analyzer": "french_search"
                            }
                        }
                    }
                }
            }
        }
    }
}

Then my 'attachment' pipeline definition:

{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "extfile.data.en",
        "ignore_missing": true
      }
    },
    {
      "attachment" : {
        "field" : "extfile.data.fr",
        "ignore_missing": true
      }
    },
    {
      "attachment" : {
        "field" : "gallery.data.fr",
        "ignore_missing": true
      }
    },
    {
      "attachment" : {
        "field" : "gallery.data.fr",
        "ignore_missing": true
      }
    }
  ]
}

Actually when I'm trying to index a document, ES raises an exception saying that "data" is not an integer. So any help would be greatly welcome!

Best regards,
Thierry


(David Pilato) #2

You should not use the mapper attachments plugin anymore as it has been deprecated and will be removed in 6.0.

Use only ingest-attachment instead.

So basically don't use:

"type": "attachment"

(Thierry Florac) #3

Hi David,
I finally made the ingest-attachment plug-in work...
But I wanted to handle the "attachment" property as an inner object property (like "extfile.data.fr") but the only solution I found until now was to make my "attachment" property a first-level one.
Is there a solution to define ingest-attachment pipeline for inner properties?


(David Pilato) #4

Yes. This should work on inner fields.

Do you have a full non working example ?


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.