Hi,
I have a list of mime types associated wth custom "doctypes", i.e. (shortened):
{
"binary":[
"application/octet-stream"
],
"latex":[
"application/x-bibtex-text-file",
"application/x-latex",
"application/x-tex"
],
"markup":[
"application/xml",
"text/html",
"text/markdown",
"text/xhtml+xml",
"text/xml",
"text/x-web-markdown"
],
"pdf":[
"application/pdf"
],
"spreadsheet":[
"text/csv",
"application/ms-excel"
],
"source":[
"text/x-c",
"text/x-python",
"text/sh"
],
"text":[
"text/plain"
]
}
The users are only interested in querying doctypes.
The current implementation solves this by creating a huge filter and aggregation query based on the file.content_type
field:
...
filter:
[ { terms:
{ 'file.content_type':
[ 'text/rtf',
'application.msword',
'application/vnd.ms-word',
'application/vnd.ms-word.document.macroenabled.12',
'application/vnd.ms-word.template.macroenabled.12',
'application/vnd.oasis.opendocument.text',
'application/vnd.oasis.opendocument.text-flat-xml',
'application/vnd.oasis.opendocument.text-master',
'application/vnd.oasis.opendocument.text-template',
'application/vnd.oasis.opendocument.text-web',
'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'application/vnd.openxmlformats-officedocument.wordprocessingml.template' ] } } ] } },
aggregations:
{ doctypes:
{ filters:
{ other_bucket_key: 'other',
filters: {
document:
{ terms:
{ 'file.content_type':
[ 'text/rtf',
'application.msword',
'application/vnd.ms-word',
'application/vnd.ms-word.document.macroenabled.12',
'application/vnd.ms-word.template.macroenabled.12',
'application/vnd.oasis.opendocument.text',
'application/vnd.oasis.opendocument.text-flat-xml',
'application/vnd.oasis.opendocument.text-master',
'application/vnd.oasis.opendocument.text-template',
'application/vnd.oasis.opendocument.text-web',
'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'application/vnd.openxmlformats-officedocument.wordprocessingml.template' ] } },
...
This is obviously not an ideal situation. I'd prefer to create a file.doctype
field at time of indexation based on my mime types list. I figured out how to add a field and it's value for a single document with the update API, but I'm not sure how to do this conditionally.
Kind Regards,
thurse