Processors: How to add custom fields based on a set of field values?

Hi,

I have a list of mime types associated wth custom "doctypes", i.e. (shortened):

{
   "binary":[
      "application/octet-stream"
   ],
   "latex":[
      "application/x-bibtex-text-file",
      "application/x-latex",
      "application/x-tex"
   ],
   "markup":[
      "application/xml",
      "text/html",
      "text/markdown",
      "text/xhtml+xml",
      "text/xml",
      "text/x-web-markdown"
   ],
   "pdf":[
      "application/pdf"
   ],
   "spreadsheet":[
      "text/csv",
      "application/ms-excel"
   ],
   "source":[
      "text/x-c",
      "text/x-python",
      "text/sh"
   ],
   "text":[
      "text/plain"
   ]
}

The users are only interested in querying doctypes.

The current implementation solves this by creating a huge filter and aggregation query based on the file.content_type field:

...
filter:
         [ { terms:
              { 'file.content_type':
                 [ 'text/rtf',
                   'application.msword',
                   'application/vnd.ms-word',
                   'application/vnd.ms-word.document.macroenabled.12',
                   'application/vnd.ms-word.template.macroenabled.12',
                   'application/vnd.oasis.opendocument.text',
                   'application/vnd.oasis.opendocument.text-flat-xml',
                   'application/vnd.oasis.opendocument.text-master',
                   'application/vnd.oasis.opendocument.text-template',
                   'application/vnd.oasis.opendocument.text-web',
                   'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
                   'application/vnd.openxmlformats-officedocument.wordprocessingml.template' ] } } ] } },
  aggregations:
   { doctypes:
      { filters:
         { other_bucket_key: 'other',
           filters: {
          document:
           { terms:
              { 'file.content_type':
                 [ 'text/rtf',
                   'application.msword',
                   'application/vnd.ms-word',
                   'application/vnd.ms-word.document.macroenabled.12',
                   'application/vnd.ms-word.template.macroenabled.12',
                   'application/vnd.oasis.opendocument.text',
                   'application/vnd.oasis.opendocument.text-flat-xml',
                   'application/vnd.oasis.opendocument.text-master',
                   'application/vnd.oasis.opendocument.text-template',
                   'application/vnd.oasis.opendocument.text-web',
                   'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
                   'application/vnd.openxmlformats-officedocument.wordprocessingml.template' ] } },
    ...

This is obviously not an ideal situation. I'd prefer to create a file.doctype field at time of indexation based on my mime types list. I figured out how to add a field and it's value for a single document with the update API, but I'm not sure how to do this conditionally.

Kind Regards,

thurse

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.