Trying to understand high search latency

Capibara · March 23, 2018, 11:56am

I have setup an ES index to index user centered data, each document contains the relevant user ID (either in an owner field on in a contributor field) and 2 fields that need to be searched on with a "contains" semantic. The index contains about 100M documents each of them sized about 15K with a complex nested structure. The index is setup with dynamic_templates that indexes all fields as keywords (since no free text search is needed tokenizing seemed redundant), some fields are also normalized with a lowercase filter to enable case-insensitive search. The reasoning behind indexing all fields at this point in time is to avoid having to reindex in order to allow searches on other fields so that new features could be added quickly (the size of the index makes reindexing abit painful). The cluster is configured with 3 nodes and 5 shards with replication factor of 1. The query I use looks like this:

{
    "query": {
        "bool": {
            "must": [
                {
                    "bool": {
                        "should": [
                            {
                                "wildcard": {
                                    "document.name": {
                                        "value": "*SEARCH_TERM*"
                                    }
                                }
                            },
                            {
                                "wildcard": {
                                    "externalData.properties.displayName": {
                                        "value": "*SEARCH_TERM*"
                                    }
                                }
                            }
                        ]
                    }
                }
            ],
            "filter": [
                {
                    "bool": {
                        "should": [
                            {
                                "term": {
                                    "contributorIds": {
                                        "value": "deadbeef-cafe-babe-cafe-deadbeefcafe"
                                    }
                                }
                            },
                            {
                                "term": {
                                    "document.ownerId": {
                                        "value": "deadbeef-cafe-babe-cafe-deadbeefcafe"
                                    }
                                }
                            }
                        ],
                        "filter": [
                            {
                                "term": {
                                    "deleted": {
                                        "value": "false"
                                    }
                                }
                            }
                        ]
                    }
                }
            ]
        }
    },
    "size": 50,
    "sort": [
        {
            "_doc": {
                "order": "asc"
            }
        }
    ]
}

I've noticed searches (very low RPM) with high latency (and latency variance but I assume that is related to some caching mechanism) varying between 300ms and 1500ms per search. I am trying to understand the pain point in this query so as to understand whether a solution that does not require reindexing (such as using a ngram tokenizer on the relevant searchable fields) can be used to lower the latency.
I've also tried using a filtered query with constant_score:

{
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "should": [
                        {
                            "wildcard": {
                                "document.name": {
                                    "value": "*SEARCH_TERM*"
                                }
                            }
                        },
                        {
                            "wildcard": {
                                "externalData.properties.displayName": {
                                    "value": "*SEARCH_TERM*"
                                }
                            }
                        }
                    ],
                    "must": [
                        {
                            "term": {
                                "contributorIds": {
                                    "value": "deadbeef-cafe-babe-cafe-deadbeefcafe"
                                }
                            }
                        },
                        {
                            "term": {
                                "document.ownerId": {
                                    "value": "deadbeef-cafe-babe-cafe-deadbeefcafe"
                                }
                            }
                        },
                        {
                            "term": {
                                "deleted": {
                                    "value": "false"
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "size": 50,
    "sort": [
        {
            "_doc": {
                "order": "asc"
            }
        }
    ]
}

but the latency has not changed. Can anyone shed some light on what is the pain point in this query? I am trying to understand possible scaling paths (adding 2 more nodes for instance) vs. re-indexing the data in a different way (for instance using an ngram tokenizer) which I would rather avoid if possible.

dadoonet · March 23, 2018, 12:29pm

wildcard is the pain point IMO. As the doc says.

system · April 20, 2018, 12:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Identifying/fixing latency (95th percentile between 1-4.5 seconds) Elasticsearch	10	1585	August 23, 2019
Query response times Elasticsearch	5	370	July 6, 2017
Elasticsearch high latency Elasticsearch	19	6991	June 6, 2018
Search rate and index latency is high and showing high response Elasticsearch	0	91	May 8, 2024
Search latency & index latency for elasticsearch Elasticsearch	2	6250	November 5, 2018

Trying to understand high search latency

Related topics