How to specify mapping to disable index for all fields of an object type


(Philips Kokoh Prasetyo) #1

I have an field with type object in my document. This field has a lot of fields inside (and possibly additional other fields may be added later since we crawl data from external sources). For example:

{
  "field1": 1,
  "field2": 2,
  "obj1": {
     "field1": "value1",
     "field2": "value2",
     ...
     "fieldn": "valuen",
     "new_field": "value",
     "additional_1": 123
  }
}

Instead of specifying the mapping for all fields one by one, are there any convinient way to tell the mapping that all fields under "obj1" is not indexed?

We don't want newly added fields under obj1 to be indexed too. We may disable dynamic mapping, but it may disrupt the crawlers. It would be nice if the newly added fields are added, but not indexed.

Thank you


(Glen R Smith) #2

I'm assuming what you want is these to be unanalyzed, and not just omitted from indexing altogether, otherwise, "dynamic": false on obj1 would suffice. Not sure whether your "obj1" is currently mapped as "nested", so I'm demonstrating that this solution works regardless (while at the same time demonstrating the ability of wildcard matching).

Your solution is based on dynamic_templates with a path_match rule.

Let me know if this isn't sufficiently clear.

# delete the index
DELETE dynamic_object_nope

# create the index with a mapping that includes dynamic templates
POST dynamic_object_nope
{
  "settings": {
    "index": {
      "number_of_replicas": "0",
      "number_of_shards": "1"
    }
  },
  "mappings": {
    "test_type": {
      "dynamic_templates": [
        {
          "object1_field": {
            "path_match": "obj*.*",
            "mapping": {
              "index": "not_analyzed"
            }
          }
        }
      ],
      "properties": {
        "field1": {
          "type": "integer"
        },
        "obj1": {
          "type": "object"
        },
        "obj2": {
          "type": "nested"
        }
      }
    }
  }
}

POST dynamic_object_nope/test_type
{
  "field1": 1,
  "field2": 2,
  "obj1": {
    "new_field": "value",
    "additional_1": 123
  },
  "obj2": {
    "new_field": "value",
    "additional_1": 123
  }
}

POST _refresh

# examine mapping
GET dynamic_object_nope/test_type/_mapping

path_match is specifically available for this situation (as opposed to "match", which would just match top-level fields).

Notice that it worked on both the object declared as nested (obj2) and not (obj1). In both cases, "new_field" got the "index": "not_analyzed" treatment. That doesn't show up on "additional_1" because numerics are automatically not_analyzed already.

{
   "dynamic_object_nope": {
      "mappings": {
         "test_type": {
            "dynamic_templates": [
               {
                  "object1_field": {
                     "mapping": {
                        "index": "not_analyzed"
                     },
                     "path_match": "obj*.*"
                  }
               }
            ],
            "properties": {
               "field1": {
                  "type": "integer"
               },
               "field2": {
                  "type": "long"
               },
               "obj1": {
                  "properties": {
                     "additional_1": {
                        "type": "long"
                     },
                     "new_field": {
                        "type": "string",
                        "index": "not_analyzed"
                     }
                  }
               },
               "obj2": {
                  "type": "nested",
                  "properties": {
                     "additional_1": {
                        "type": "long"
                     },
                     "new_field": {
                        "type": "string",
                        "index": "not_analyzed"
                     }
                  }
               }
            }
         }
      }
   }
}

(Philips Kokoh Prasetyo) #3

Thanks @GlenRSmith for the explanation


(system) #4