Getting additional fields without _source

Hi,

I was looking for options to improve indexing performance and reduce disk space usage on our ES cluster. When I checked the documentation for the _source field, I saw the warnings mentioned there. However, it occurred to me that in some cases I use that to get additional fields for some of my queries. In case, I disable _source (knowing the consequences), is there another way to return additional fields?

Thanks!

You can use docvalue_fields.

As I see, docvalue_fields doesn't support text fields which is partly okay, but in case I'd need one with _source disabled, can I use fields?
Is it possible to use both of those in the same query?
Lastly, in case I don't need _source at all should I always include "_source": false in every query?

Thank you!

Yes. Default mapping created for string field will contain keyword field of type keyword. That will support docvalue. Use this if value is short in length. If the field value is source of html pages, use store = true instead. in which case you can retrieve it using stored_fields. See snipped below.

Both as in _source and docvalues? Then yes.

Yes. default value for _source in the request object is false in 7.8. For older version default was true. Not sure when it changed. But if you have turned off _source in mapping, it won't matter.

{
    "mappings" : {
        "properties" : {
            "src" : {
                "type" : "text",
                "store" : true,
                "fields" : {
                    "keyword" : {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}
{
   "_source" : true,
  "docvalue_fields": [
    "src.keyword"
  ],
  "stored_fields": [
    "src"
  ]
}

Thanks!

Nope, I meant "fields" and "docvalue_fields". I'm looking for the easiest way to drop "_source" while I can still get any field I want. I don't really see why I should store the original JSON and use that to retrieve a field that is indexed.
There are some indices where I can drop _source easily, however I won't be able to maintain each field's mapping individually.

Original JSON is required if you want to do partial document update (merge) or reindex. Since logs are not updated I have used index without source for that use case.

Also, if you are always fetching all fields, getting _source instead of 100 fields using doc values or stored_fields may be faster. I haven't benchmarked it.

Yep, that's why I said I know the consequences (I mostly have daily indices with short lifecycles).

That's something I'll keep in mind.

Thank you for your inputs! I'll perform some tests to see what options I have.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.