Datastream speed

Hello.
Can U help me to understand why datastream search are slower then index search.
I have one Index:
size: 572 GB
shards: 30 primary shards

There are stored some data by one mounth.

I have create datastream witch will be rollover indexes by 30GB size of primary shard.
Each index inside:
shards: 3 primary shards per index
size: 30GB per primary shard
I have reindex documents from old index to datastream. Rollover works perfectly.
But...
I was unpleasantly surprised when run my tests for searching by old index and datastream.
Test cases try to search by 1, 7 and 30 days.
Old index searching is more faster (~2 times) by each of this test cases. I thought datastream searching will be faster or equals of Old index speed.

My query is:

{
            "bool": {
                "filter": [
                    {
                        "bool": {
                            "minimum_should_match": 1,
                            "should": [
                                {
                                    "range": date_range_query
                                }
                            ]
                        }
                    }
                ],
                "must": [
                    {
                        "bool": {
                            "minimum_should_match": 1,
                            "should": [
                                {
                                    "bool": {
                                        "minimum_should_match": 1,
                                        "should": [
                                            {
                                                "nested": {
                                                    "path": "phrases",
                                                    "query": {
                                                        "bool": {
                                                            "minimum_should_match": 1,
                                                            "should": [
                                                                {
                                                                    "match_phrase": {
                                                                        "phrases.phrase": {
                                                                            "query": "hello",
                                                                            "_name": "phrase"
                                                                        }
                                                                    }
                                                                }
                                                            ]
                                                        }
                                                    },
                                                    "score_mode": "sum"
                                                }
                                            }
                                        ]
                                    }
                                }
                            ]
                        }
                    }
                ]
            }
        }

What did I do wrong?
Thx

Can you share the shard count and average size you are querying in the two scenarios? 572GB across is around 19GB, which naturally is smaller than the shard size used for the data stream.

What is the size and specification of the cluster?

Old index has 30 shards. It's about 19 GB each of them
Datastream has 30 shards too. It's about 19 GB each of them.

Cluster have 3 nodes.

"nodes": {
    "-67p_CzBQa2EoSeBoOuQ4A": {
        "ephemeral_id": "pkU2DG8FR_-9GP4XUsOzTQ",
        "attributes": {
            "ml.machine_memory": "270139133952",
            "ml.max_open_jobs": "512",
            "xpack.installed": "true",
            "ml.max_jvm_size": "33285996544",
            "transform.node": "true"
        },
        "roles": [
            "data",
            "data_cold",
            "data_content",
            "data_hot",
            "data_warm",
            "ingest",
            "master",
            "ml",
            "remote_cluster_client",
            "transform"
        ]
    },
    "XAkVYVEeTL64TyXmirj6dQ": {
        "attributes": {
            "ml.machine_memory": "270139133952",
            "ml.max_open_jobs": "512",
            "xpack.installed": "true",
            "ml.max_jvm_size": "33285996544",
            "transform.node": "true"
        },
        "roles": [
            "data",
            "data_cold",
            "data_content",
            "data_hot",
            "data_warm",
            "ingest",
            "master",
            "ml",
            "remote_cluster_client",
            "transform"
        ]
    },
    "PsL_UwQnSKGVP4uhT9WrMA": {
        "ephemeral_id": "FdupJCWIRpuxwUSxEUt9NQ",
        "attributes": {
            "ml.machine_memory": "270139133952",
            "ml.max_open_jobs": "512",
            "xpack.installed": "true",
            "ml.max_jvm_size": "33285996544",
            "transform.node": "true"
        },
        "roles": [
            "data",
            "data_cold",
            "data_content",
            "data_hot",
            "data_warm",
            "ingest",
            "master",
            "ml",
            "remote_cluster_client",
            "transform"
        ]
    }
},

Please show the stats for the indices in question based on the cat indices API. Are both set of indices stored in the same cluster?

GET /_cat/indices
green open .ds-multy_stream-2022.06.10-000009 vRXSxZ5JSeaA4nalJEL7zA 3 0 0 0 2mb 2mb
green open .ds-multy_stream-2022.06.10-000007 L0zaSMoWREOf5frvRaA8uQ 3 0 308136129 0 58.7gb 58.7gb
green open .ds-multy_stream-2022.06.10-000008 v9s2UTsnThyGn4naK7i8aw 3 0 299954735 176 57.1gb 57.1gb
green open .ds-multy_stream-2022.06.10-000005 GKFy3bh1Qoy-4u2OBh1lAg 3 0 237225414 0 46gb 46gb
green open .ds-multy_stream-2022.06.10-000006 EFmZlZBfQ0qeu550oyMQYw 3 0 332976906 31 63.2gb 63.2gb
green open .ds-multy_stream-2022.06.10-000003 xG0jIwlVRd21UooGw-3TOg 3 0 159646867 0 31.6gb 31.6gb
green open .ds-multy_stream-2022.06.10-000004 OyTFsyH8R2edN8HDoWlsiw 3 0 309002350 0 58.3gb 58.3gb
green open .ds-multy_stream-2022.06.10-000001 ZOTvlC0SRy6x__XAjbm2DA 3 0 179172306 0 34.8gb 34.8gb
green open .ds-multy_stream-2022.06.10-000002 Diq0_ZWJQNiyCn0q6v1Cdg 3 0 348360876 0 66gb 66gb
green open call_v1 oKiS6_0cTtmxV8vj7DjpYw 30 0 3000412573 15727907 571.6gb 571.6gb

Are both set of indices stored in the same cluster?
Yes they are.

Each datastream's index consist of 3 days data. All datastream's indexes is 3 days of month. For example: .ds-multy_stream-2022.06.10-000001 - includes 2022-05-01T00:00:00 - 2022-05-04T00:00:00, .ds-multy_stream-2022.06.10-000002 - includes 2022-05-04T00:00:00 - 2022-05-07T00:00:00 and etc.

Heeelp))

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.