Processors: How to add custom fields based on a set of field values?

thurse · April 29, 2019, 7:22am

Hi,

I have a list of mime types associated wth custom "doctypes", i.e. (shortened):

{
   "binary":[
      "application/octet-stream"
   ],
   "latex":[
      "application/x-bibtex-text-file",
      "application/x-latex",
      "application/x-tex"
   ],
   "markup":[
      "application/xml",
      "text/html",
      "text/markdown",
      "text/xhtml+xml",
      "text/xml",
      "text/x-web-markdown"
   ],
   "pdf":[
      "application/pdf"
   ],
   "spreadsheet":[
      "text/csv",
      "application/ms-excel"
   ],
   "source":[
      "text/x-c",
      "text/x-python",
      "text/sh"
   ],
   "text":[
      "text/plain"
   ]
}

The users are only interested in querying doctypes.

The current implementation solves this by creating a huge filter and aggregation query based on the file.content_type field:

...
filter:
         [ { terms:
              { 'file.content_type':
                 [ 'text/rtf',
                   'application.msword',
                   'application/vnd.ms-word',
                   'application/vnd.ms-word.document.macroenabled.12',
                   'application/vnd.ms-word.template.macroenabled.12',
                   'application/vnd.oasis.opendocument.text',
                   'application/vnd.oasis.opendocument.text-flat-xml',
                   'application/vnd.oasis.opendocument.text-master',
                   'application/vnd.oasis.opendocument.text-template',
                   'application/vnd.oasis.opendocument.text-web',
                   'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
                   'application/vnd.openxmlformats-officedocument.wordprocessingml.template' ] } } ] } },
  aggregations:
   { doctypes:
      { filters:
         { other_bucket_key: 'other',
           filters: {
          document:
           { terms:
              { 'file.content_type':
                 [ 'text/rtf',
                   'application.msword',
                   'application/vnd.ms-word',
                   'application/vnd.ms-word.document.macroenabled.12',
                   'application/vnd.ms-word.template.macroenabled.12',
                   'application/vnd.oasis.opendocument.text',
                   'application/vnd.oasis.opendocument.text-flat-xml',
                   'application/vnd.oasis.opendocument.text-master',
                   'application/vnd.oasis.opendocument.text-template',
                   'application/vnd.oasis.opendocument.text-web',
                   'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
                   'application/vnd.openxmlformats-officedocument.wordprocessingml.template' ] } },
    ...

This is obviously not an ideal situation. I'd prefer to create a file.doctype field at time of indexation based on my mime types list. I figured out how to add a field and it's value for a single document with the update API, but I'm not sure how to do this conditionally.

Kind Regards,

thurse

system · May 27, 2019, 7:22am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.