Confused about _doc as document type

Elasticsearch 6.8.5.

I'm confused about the use of _doc as a type. It's possible to create an index with one type of _doc but it also seems like _doc has special meaning in some contexts, including in templates.

Can someone explain to me exactly how _doc is treated under different circumstances, or how it should or shouldn't be used?

Below are some examples of using _doc, the results being the source of my confusion.

If I start a new instance of Elasticsearch and load these two templates

$ curl -s  "http://localhost:9200/_template/all?pretty"
{
  "all" : {
    "order" : 0,
    "index_patterns" : [
      "*"
    ],
    "settings" : { },
    "mappings" : {
      "_doc" : {
        "properties" : {
          "geoip" : {
            "type" : "ip"
          }
        }
      }
    },
    "aliases" : { }
  }
}
$ curl -s  "http://localhost:9200/_template/testtest?pretty"
{
  "testtest" : {
    "order" : 99,
    "index_patterns" : [
      "testtest"
    ],
    "settings" : { },
    "mappings" : {
      "foo" : { }
    },
    "aliases" : { }
  }
}

Then creating a new document in a new index called testtest works.

$ curl -s -X PUT  -H 'Content-Type: application/json' "http://localhost:9200/testtest/foo/1?pretty"  -d '{ "hello" : "world" }'
{
  "_index" : "testtest",
  "_type" : "foo",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

But, unless _doc has some special meaning, it shouldn't, because I've got two templates with mappings for two different types that apply to an index called testtest.

The resulting index contains only one document type of foo, but has mappings that are only specified in the template with mappings for type _doc

$ curl -s  "http://localhost:9200/testtest/_mapping?pretty"
{
  "testtest" : {
    "mappings" : {
      "foo" : {
        "properties" : {
          "geoip" : {
            "type" : "ip"
          },
          "hello" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}

So it's like _doc was used as a placeholder and mappings for it merged with those for other types.

Additionally curious is that the type of the document I created with the type specified as foo changes depending on how I request it.

$ curl -s  "http://localhost:9200/testtest/foo/1?pretty"
{
  "_index" : "testtest",
  "_type" : "foo",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "hello" : "world"
  }
}
$ curl -s  "http://localhost:9200/testtest/_doc/1?pretty"
{
  "_index" : "testtest",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "hello" : "world"
  }
}

If I create a new document in a new index with a name only the all template applies to and specify the type as _doc the index gets created with a document type of _doc.

$ curl -s -X PUT  -H 'Content-Type: application/json' "http://localhost:9200/testtest_doc/_doc/1?pretty"  -d '{ "hello" : "world" }'
{
  "_index" : "testtest_doc",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
$ curl -s  "http://localhost:9200/testtest_doc/_mapping?pretty"
{
  "testtest_doc" : {
    "mappings" : {
      "_doc" : {
        "properties" : {
          "geoip" : {
            "type" : "ip"
          },
          "hello" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}
$ curl -s  "http://localhost:9200/testtest_doc/_doc/1?pretty"
{
  "_index" : "testtest_doc",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "hello" : "world"
  }
}
$ curl -s  "http://localhost:9200/testtest_doc/foo/1?pretty"
{
  "_index" : "testtest_doc",
  "_type" : "foo",
  "_id" : "1",
  "found" : false
}

So in that case it seems _doc in the template wasn't treated like a placeholder, it was treated as a literal string.

If I delete the testtest index then create it again, but this time specifying a document type of _doc.

$ curl -s -X DELETE  "http://localhost:9200/testtest/?pretty"
{
  "acknowledged" : true
}
$ curl -s -X PUT  -H 'Content-Type: application/json' "http://localhost:9200/testtest/_doc/1?pretty"  -d '{ "hello" : "world" }'
{
  "_index" : "testtest",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

That tells me the index was created with a document type of _doc. Except the only mapping in the index is for type foo

$ curl -s  "http://localhost:9200/testtest/_mapping?pretty"
{
  "testtest" : {
    "mappings" : {
      "foo" : {
        "properties" : {
          "geoip" : {
            "type" : "ip"
          },
          "hello" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}

And I can request the document using type foo

$ curl -s  "http://localhost:9200/testtest/_doc/1?pretty"
{
  "_index" : "testtest",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "_seq_no" : 0,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "hello" : "world"
  }
}

So in that case _doc was treated like a placeholder not a literal string.

Types are going to be removed in 8.0. Using _doc as the type name is the way to go for now.
It will help you moving smoothly to 8.0.

That seems like you're saying ignore the inconsistent confusing behaviour I gave examples of and make sure you specify _doc as the type everywhere.

Advice to use _doc as the type seems reasonable, though Logstash currently defaults to sending events with the type set to doc.

document_type

Value type is string
There is no default value for this setting.
This option is deprecated

Note: This option is deprecated due to the removal of types in Elasticsearch 6.0. It will be removed in the next major version of Logstash. This sets the document type to write events to. Generally you should try to write only similar events to the same type. String expansion %{foo} works here. If you don’t set a value for this option:

for elasticsearch clusters 6.x and above: the value of doc will be used;
for elasticsearch clusters 5.x and below: the event’s type field will be used, if the field is not present the value of doc will be used.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.