Trying to understand high search latency

I have setup an ES index to index user centered data, each document contains the relevant user ID (either in an owner field on in a contributor field) and 2 fields that need to be searched on with a "contains" semantic. The index contains about 100M documents each of them sized about 15K with a complex nested structure. The index is setup with dynamic_templates that indexes all fields as keywords (since no free text search is needed tokenizing seemed redundant), some fields are also normalized with a lowercase filter to enable case-insensitive search. The reasoning behind indexing all fields at this point in time is to avoid having to reindex in order to allow searches on other fields so that new features could be added quickly (the size of the index makes reindexing abit painful). The cluster is configured with 3 nodes and 5 shards with replication factor of 1. The query I use looks like this:

{
    "query": {
        "bool": {
            "must": [
                {
                    "bool": {
                        "should": [
                            {
                                "wildcard": {
                                    "document.name": {
                                        "value": "*SEARCH_TERM*"
                                    }
                                }
                            },
                            {
                                "wildcard": {
                                    "externalData.properties.displayName": {
                                        "value": "*SEARCH_TERM*"
                                    }
                                }
                            }
                        ]
                    }
                }
            ],
            "filter": [
                {
                    "bool": {
                        "should": [
                            {
                                "term": {
                                    "contributorIds": {
                                        "value": "deadbeef-cafe-babe-cafe-deadbeefcafe"
                                    }
                                }
                            },
                            {
                                "term": {
                                    "document.ownerId": {
                                        "value": "deadbeef-cafe-babe-cafe-deadbeefcafe"
                                    }
                                }
                            }
                        ],
                        "filter": [
                            {
                                "term": {
                                    "deleted": {
                                        "value": "false"
                                    }
                                }
                            }
                        ]
                    }
                }
            ]
        }
    },
    "size": 50,
    "sort": [
        {
            "_doc": {
                "order": "asc"
            }
        }
    ]
}

I've noticed searches (very low RPM) with high latency (and latency variance but I assume that is related to some caching mechanism) varying between 300ms and 1500ms per search. I am trying to understand the pain point in this query so as to understand whether a solution that does not require reindexing (such as using a ngram tokenizer on the relevant searchable fields) can be used to lower the latency.
I've also tried using a filtered query with constant_score:

{
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "should": [
                        {
                            "wildcard": {
                                "document.name": {
                                    "value": "*SEARCH_TERM*"
                                }
                            }
                        },
                        {
                            "wildcard": {
                                "externalData.properties.displayName": {
                                    "value": "*SEARCH_TERM*"
                                }
                            }
                        }
                    ],
                    "must": [
                        {
                            "term": {
                                "contributorIds": {
                                    "value": "deadbeef-cafe-babe-cafe-deadbeefcafe"
                                }
                            }
                        },
                        {
                            "term": {
                                "document.ownerId": {
                                    "value": "deadbeef-cafe-babe-cafe-deadbeefcafe"
                                }
                            }
                        },
                        {
                            "term": {
                                "deleted": {
                                    "value": "false"
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "size": 50,
    "sort": [
        {
            "_doc": {
                "order": "asc"
            }
        }
    ]
}

but the latency has not changed. Can anyone shed some light on what is the pain point in this query? I am trying to understand possible scaling paths (adding 2 more nodes for instance) vs. re-indexing the data in a different way (for instance using an ngram tokenizer) which I would rather avoid if possible.

2 Likes

wildcard is the pain point IMO. As the doc says.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.