I'm trying to build an application using an Elasticsearch index. I have several "inner" fields which can contain binary data (mainly PDF), and I'm looking for the best way to define my pipeline and mapping, given the facts that:
- all fields and contents can be provided in several languages (french and english) and in several fields
- I have to be able to query contents for a given language and/or for a given field.
This is how I defined my mapping until now:
{
"WfNewsEvent": {
"properties": {
"title": {
"type": "object",
"properties": {
"en": {
"type": "string"
},
"fr": {
"type": "string",
"analyzer": "french",
"search_analyzer": "french_search"
}
}
},
...
"extfile": {
"type": "object",
"properties": {
"title": {
"type": "object",
"properties": {
"en": {
"type": "string"
},
"fr": {
"type": "string",
"analyzer": "french",
"search_analyzer": "french_search"
}
}
},
"description": {
"type": "object",
"properties": {
"en": {
"type": "string"
},
"fr": {
"type": "string",
"analyzer": "french",
"search_analyzer": "french_search"
}
}
},
"data": {
"type": "object",
"properties": {
"en": {
"type": "attachment"
},
"fr": {
"type": "attachment",
"analyzer": "french",
"search_analyzer": "french_search"
}
}
}
}
},
"gallery": {
"type": "object",
"properties": {
"title": {
"type": "object",
"properties": {
"en": {
"type": "string"
},
"fr": {
"type": "string",
"analyzer": "french",
"search_analyzer": "french_search"
}
}
},
"description": {
"type": "object",
"properties": {
"en": {
"type": "string"
},
"fr": {
"type": "string",
"analyzer": "french",
"search_analyzer": "french_search"
}
}
},
"data": {
"type": "object",
"properties": {
"en": {
"type": "attachment"
},
"fr": {
"type": "attachment",
"analyzer": "french",
"search_analyzer": "french_search"
}
}
}
}
}
}
}
}
Then my 'attachment' pipeline definition:
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "extfile.data.en",
"ignore_missing": true
}
},
{
"attachment" : {
"field" : "extfile.data.fr",
"ignore_missing": true
}
},
{
"attachment" : {
"field" : "gallery.data.fr",
"ignore_missing": true
}
},
{
"attachment" : {
"field" : "gallery.data.fr",
"ignore_missing": true
}
}
]
}
Actually when I'm trying to index a document, ES raises an exception saying that "data" is not an integer. So any help would be greatly welcome!
Best regards,
Thierry