'_source' filtering is slower than query without '_source' field

I have an elasticsearch instance with some data on it, and when trying queries on the data, it is slower to filter '_source' in query than not mentioning the '_source' key at all. Is there any specific reason for this?

Profiling the queries showed FetchSourcePhase taking much more time with source filtering on the order of seconds, compared to without '_source'

@stephenb Could you take a look at this? I am using Elasticsearch 8.6.0

This forum is manned by volunteers, even if they work at Elastic, so it is considered rude to ping people not already involved in the thread. You have also provided very little information to go on. Please provide the two queries you are comparing together with some information about the difference in latency, the nature and size of the data queried and information about the cluster itself. It would also be useful if you could post the profiling information from both queries.

1 Like

Thank you for the information regarding the thread,

In regard to additional information in relation to the question, we can take a simple query with size set to 100:

{
    "query": {"bool": {"must": [{"match_all": {}}], "must_not": [], "should": []}},
    "from": 0,
    "size": 100,
    "sort": [],
    "aggs": {},
    "_source": [
        "field1",
        "field2",
        "field3",
        "field4",
        "field5",
        "field6",
        "field7",
        "field8",
        "field9",
        "field10",
        "field11",
        "field12",
        "field13”,
        "field14",
        "field15",
        "field16",
    ]
}

Elasticsearch reports time taken to finish this query is more than a second, approximately 1200-1300 ms. The profile information for this query is:

{
    "shards": [
        {
            "id": "[shard_id][sample_index][0]",
            "searches": [
                {
                    "query": [
                        {
                            "type": "ConstantScoreQuery",
                            "description": "ConstantScore(FieldExistsQuery [field=_primary_term])",
                            "time_in_nanos": 194619,
                            "breakdown": {
                                "set_min_competitive_score_count": 0,
                                "match_count": 0,
                                "shallow_advance_count": 0,
                                "set_min_competitive_score": 0,
                                "next_doc": 85247,
                                "match": 0,
                                "next_doc_count": 558,
                                "score_count": 558,
                                "compute_max_score_count": 0,
                                "compute_max_score": 0,
                                "advance": 9136,
                                "advance_count": 10,
                                "score": 23042,
                                "build_scorer_count": 20,
                                "create_weight": 4883,
                                "shallow_advance": 0,
                                "create_weight_count": 1,
                                "build_scorer": 72311
                            },
                            "children": [
                                {
                                    "type": "FieldExistsQuery",
                                    "description": "FieldExistsQuery [field=_primary_term]",
                                    "time_in_nanos": 94503,
                                    "breakdown": {
                                        "set_min_competitive_score_count": 0,
                                        "match_count": 0,
                                        "shallow_advance_count": 0,
                                        "set_min_competitive_score": 0,
                                        "next_doc": 38538,
                                        "match": 0,
                                        "next_doc_count": 558,
                                        "score_count": 0,
                                        "compute_max_score_count": 0,
                                        "compute_max_score": 0,
                                        "advance": 8169,
                                        "advance_count": 10,
                                        "score": 0,
                                        "build_scorer_count": 20,
                                        "create_weight": 1730,
                                        "shallow_advance": 0,
                                        "create_weight_count": 1,
                                        "build_scorer": 46066
                                    }
                                }
                            ]
                        }
                    ],
                    "rewrite_time": 74439,
                    "collector": [
                        {
                            "name": "MultiCollector",
                            "reason": "search_multi",
                            "time_in_nanos": 297006,
                            "children": [
                                {
                                    "name": "SimpleTopScoreDocCollector",
                                    "reason": "search_top_hits",
                                    "time_in_nanos": 93227
                                },
                                {
                                    "name": "BucketCollectorWrapper: [BucketCollectorWrapper[bucketCollector=org.elasticsearch.search.aggregations.BucketCollector$1@ID]]",
                                    "reason": "aggregation",
                                    "time_in_nanos": 39868
                                }
                            ]
                        }
                    ]
                }
            ],
            "aggregations": [],
            "fetch": {
                "type": "fetch",
                "description": "",
                "time_in_nanos": 1468543241,
                "breakdown": {
                    "load_stored_fields": 335731591,
                    "load_source": 597153,
                    "load_stored_fields_count": 100,
                    "next_reader_count": 4,
                    "load_source_count": 100,
                    "next_reader": 475473
                },
                "debug": {
                    "stored_fields": [
                        "_id",
                        "_routing",
                        "_source"
                    ]
                },
                "children": [
                    {
                        "type": "FetchSourcePhase",
                        "description": "",
                        "time_in_nanos": 1128076983,
                        "breakdown": {
                            "process_count": 100,
                            "process": 1128072393,
                            "next_reader": 4590,
                            "next_reader_count": 4
                        },
                        "debug": {
                            "fast_path": 0
                        }
                    },
                    {
                        "type": "StoredFieldsPhase",
                        "description": "",
                        "time_in_nanos": 2392441,
                        "breakdown": {
                            "process_count": 100,
                            "process": 2380853,
                            "next_reader": 11588,
                            "next_reader_count": 4
                        }
                    }
                ]
            }
        }
    ]
}

However, if we remove source filtering:

{    
    "query": {"bool": {"must": [{"match_all": {}}], "must_not": [], "should": []}},
    "from": 0,
    "size": 100,
    "sort": [],
    "aggs": {}
}

The query takes only around 280-300 ms. The profile information for the query is:

{
    "shards": [
        {
            "id": "[shard_id][sample_index][0]",
            "searches": [
                {
                    "query": [
                        {
                            "type": "ConstantScoreQuery",
                            "description": "ConstantScore(FieldExistsQuery [field=_primary_term])",
                            "time_in_nanos": 339290,
                            "breakdown": {
                                "set_min_competitive_score_count": 0,
                                "match_count": 0,
                                "shallow_advance_count": 0,
                                "set_min_competitive_score": 0,
                                "next_doc": 161142,
                                "match": 0,
                                "next_doc_count": 558,
                                "score_count": 558,
                                "compute_max_score_count": 0,
                                "compute_max_score": 0,
                                "advance": 10104,
                                "advance_count": 10,
                                "score": 34467,
                                "build_scorer_count": 20,
                                "create_weight": 4705,
                                "shallow_advance": 0,
                                "create_weight_count": 1,
                                "build_scorer": 128872
                            },
                            "children": [
                                {
                                    "type": "FieldExistsQuery",
                                    "description": "FieldExistsQuery [field=_primary_term]",
                                    "time_in_nanos": 164376,
                                    "breakdown": {
                                        "set_min_competitive_score_count": 0,
                                        "match_count": 0,
                                        "shallow_advance_count": 0,
                                        "set_min_competitive_score": 0,
                                        "next_doc": 92178,
                                        "match": 0,
                                        "next_doc_count": 558,
                                        "score_count": 0,
                                        "compute_max_score_count": 0,
                                        "compute_max_score": 0,
                                        "advance": 8662,
                                        "advance_count": 10,
                                        "score": 0,
                                        "build_scorer_count": 20,
                                        "create_weight": 1497,
                                        "shallow_advance": 0,
                                        "create_weight_count": 1,
                                        "build_scorer": 62039
                                    }
                                }
                            ]
                        }
                    ],
                    "rewrite_time": 63997,
                    "collector": [
                        {
                            "name": "MultiCollector",
                            "reason": "search_multi",
                            "time_in_nanos": 461468,
                            "children": [
                                {
                                    "name": "SimpleTopScoreDocCollector",
                                    "reason": "search_top_hits",
                                    "time_in_nanos": 144301
                                },
                                {
                                    "name": "BucketCollectorWrapper: [BucketCollectorWrapper[bucketCollector=org.elasticsearch.search.aggregations.BucketCollector$1@ID]]",
                                    "reason": "aggregation",
                                    "time_in_nanos": 79751
                                }
                            ]
                        }
                    ]
                }
            ],
            "aggregations": [],
            "fetch": {
                "type": "fetch",
                "description": "",
                "time_in_nanos": 284016497,
                "breakdown": {
                    "load_stored_fields": 282148220,
                    "load_source": 150140,
                    "load_stored_fields_count": 100,
                    "next_reader_count": 4,
                    "load_source_count": 100,
                    "next_reader": 435831
                },
                "debug": {
                    "stored_fields": [
                        "_id",
                        "_routing",
                        "_source"
                    ]
                },
                "children": [
                    {
                        "type": "FetchSourcePhase",
                        "description": "",
                        "time_in_nanos": 232649,
                        "breakdown": {
                            "process_count": 100,
                            "process": 228559,
                            "next_reader": 4090,
                            "next_reader_count": 4
                        },
                        "debug": {
                            "fast_path": 100
                        }
                    },
                    {
                        "type": "StoredFieldsPhase",
                        "description": "",
                        "time_in_nanos": 617691,
                        "breakdown": {
                            "process_count": 100,
                            "process": 607069,
                            "next_reader": 10622,
                            "next_reader_count": 4
                        }
                    }
                ]
            }
        }
    ]
}

I have been using Elasticsearch 8.6.0.

The number of documents in the index is 560 and each document is around 550 kilobytes. It has 1 shard and 1 replica.

The JVM heap size of the cluster is 4 GiB and the memory of the cluster is 8 GiB with 4 allocated processors, in a Linux environment (Ubuntu 20.04.5 LTS).

The data mapping consists of mostly text fields and a few vector fields. There is one nested data type with various text subfields and a few numeric types, (long and float).

If some more information is needed, please let me know.

The query that is faster does a lot less work as it does not need to parse and extract fields from the source, so I would expect this to be faster. The first query need to parse the documents and extract 16 fields, which given that your documents are quite large will require a lot of extra work. As you have a single primary shard all of this work is done in a single thread, which is why I suspect you are seeing the difference in latency.

If you always want to retrieve the same set of fields you might want to look into using stored fields, which would avoid parsing the source. I am not sure it will be faster, but it could be worth testing.

1 Like

Thank you for your answer,

For another example, I have been using a single field in '_source'.

{
    "query": {"bool": {"must": [{"match_all": {}}], "must_not": [], "should": []}},
    "from": 0,
    "size": 100,
    "sort": [],
    "aggs": {},
    "_source": ["field1"]
}

This query takes more than 800 milliseconds to finish. And when using the same query without source like before,

{
    "query": {"bool": {"must": [{"match_all": {}}], "must_not": [], "should": []}},
    "from": 0,
    "size": 100,
    "sort": [],
    "aggs": {}
}

The time taken is the same as mentioned before, 280-300 milliseconds.

Is the reason for such a large difference in the 'took' times, the same even when filtering a single field?

The source is stored as a string and need to be parsed before any field can be extracted. This naturally takes longer than just returning the string from disk, especially if your documents are large.

This is described in the docs I linked to in my earlier response.

1 Like

What are you trying to actually accomplish? It's not clear to me

Source filtering as @Christian_Dahlqvist says requires significant additional processing.

Are you aware of the fields filter which is most likely much faster...

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.