Wrong result by big index

Alex_Dgero · May 16, 2025, 9:50am

Hello. I use elasticsearch v8.17.2. I need to do simple search by very big index 40M+ docs. But i works unexpected. My index have default analyzer without tokenization. So I can use only "phrase_prefix" for search by start of word.
Indices configuration:

mapping

/_mapping
{
    "prod_index": {
        "mappings": {
            "properties": {
                "aggs": {
                    "type": "object"
                },
                "count": {
                    "type": "integer"
                },
                "from": {
                    "type": "long"
                },
                "hash": {
                    "type": "long"
                },
                "name": {
                    "type": "text"
                },
                "query": {
                    "properties": {
                        "bool": {
                            "properties": {
                                "should": {
                                    "properties": {
                                        "match_all": {
                                            "type": "object"
                                        }
                                    }
                                }
                            }
                        }
                    }
                },
                "size": {
                    "type": "long"
                },
                "sort": {
                    "properties": {
                        "_id": {
                            "properties": {
                                "order": {
                                    "type": "text",
                                    "fields": {
                                        "keyword": {
                                            "type": "keyword",
                                            "ignore_above": 256
                                        }
                                    }
                                }
                            }
                        }
                    }
                },
                "version": {
                    "type": "boolean"
                }
            }
        }
    }
}

index settings

/_settings
{
    "people_positions": {
        "settings": {
            "index": {
                "routing": {
                    "allocation": {
                        "include": {
                            "_tier_preference": "data_content"
                        }
                    }
                },
                "number_of_shards": "1",
                "provided_name": "people_positions",
                "creation_date": "1747217800469",
                "number_of_replicas": "0",
                "uuid": "xfKYGzvRTVG9RHvIwD41tg",
                "version": {
                    "created": "8521000"
                }
            }
        }
    }
}

I debug this problem I create little text index with 1k+ docs. This simple example is worked.

Request small test index

/test_index/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "match_phrase_prefix": {
                        "name": {
                            "query": "man"
                        }
                    }
                },
                {
                    "range": {
                        "count": {
                            "gte": 100
                        }
                    }
                }
            ]
        }
    }
}

Response small test index

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 3.6122494,
        "hits": [
            {
                "_index": "test_index",
                "_id": "1000",
                "_score": 3.6122494,
                "_source": {
                    "count": 11000,
                    "name": "IT manager",
                    "hash": null
                }
            }
        ]
    }
}

But on big production index with 40M+ docs I have no result:
[/details]

Request big index

prod_index/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "match_phrase_prefix": {
                        "name": {
                            "query": "man"
                        }
                    }
                },
                {
                    "range": {
                        "count": {
                            "gte": 100
                        }
                    }
                }
            ]
        }
    }
}

Response big index

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

I have this document which I expect see:
prod_index/_doc/{ID}

Get document response

{
    "_index": "prod_index",
    "_id": "id",
    "_version": 434184,
    "_seq_no": 43418186,
    "_primary_term": 3,
    "found": true,
    "_source": {
        "count": 434184,
        "name": "IT manager",
        "hash": null
    }
}

Christian_Dahlqvist · May 16, 2025, 10:11am

What is the mapping for the index? If you show this, people can recreate the example and not try to guess it based on your description.

If the field you are searching only is mappend as keyword you probably need to use a potentially expensive wildcard query.

Alex_Dgero · May 16, 2025, 10:31am

Good advice. I added it. My field name is "text" without analyzer and tokenazer. It is simple transform to lovercase when I add document or search in index.

Christian_Dahlqvist · May 16, 2025, 10:55am

That does not make sense. If it is mapped as text it does have an analyzer. If you want someone to help on this I would recommend showing the exact mapping to avoid wasting peoples time.

Alex_Dgero · May 16, 2025, 11:03am

You are right. I added index mapping and settings in main description. Field "name" have analyzer " Standard analyzer" because I don't set other.

Christian_Dahlqvist · May 16, 2025, 11:15am

If it works for the small test index but not the large one I assume the mappings are different?

Alex_Dgero · May 16, 2025, 11:55am

Thank you for your attantion to this question. I appreciate it.
Mappings are not different. But okey. I try to reproduce it in more simply way and I get it.
Reproduce steps:

Create simple index:

Curl request

curl --location --request PUT 'localhost:9001/test_index_v3' \
--header 'Content-Type: application/json' \
--data '{
  "mappings": {
    "properties": {
      "count": {
        "type": "integer"
      },
      "hash": {
        "type": "long"
      },
      "name": {
        "type": "text"
      }
    }
  }
}'

Put 101 documents. Then request to search

Request

curl --location --request GET 'localhost:30090/test_index_v3/_search' \
--header 'Content-Type: application/json' \
--data '{
    "query": {
        "bool": {
            "must": [
                {
                    "match_phrase_prefix": {
                        "name": {
                            "query": "man"
                        }
                    }
                },
                {
                    "range": {
                        "count": {
                            "gte": 30
                        }
                    }
                }
            ]
        }
    }
}'

And received expected response

Response

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 3.6686912,
        "hits": [
            {
                "_index": "test_index_v3",
                "_id": "1000",
                "_score": 3.6686912,
                "_source": {
                    "count": 2000,
                    "name": "IT manager",
                    "hash": null
                }
            }
        ]
    }
}

But after that I increase document count to 1015, ther same request returned empty hits. I tried it in postman and I am sure that request is equal. I have guess, may be ,it is because first I search by match_phrase_prefix, receive some batch of result and then engine filter it by range. I am not sure.

Notice, document is existing in index while all testing. I check it by get /test_index_v3/_doc/1000
Document:

get by id response

{
    "_index": "test_index_v3",
    "_id": "1000",
    "_version": 14,
    "_seq_no": 1202,
    "_primary_term": 1,
    "found": true,
    "_source": {
        "count": 14000,
        "name": "IT manager",
        "hash": null
    }
}

Christian_Dahlqvist · May 16, 2025, 12:50pm

You have a high version number. Are you repeatedly updating documents during the test or when you load new data?

If you are making changes and updating/rewriting documents I would recommend you run the refresh API before querying. The fact that you are able to get a document by ID does not mean it is available for search as it may still only reside in the transaction log.

Alex_Dgero · June 23, 2025, 9:37pm

After some time I solved this issue. My index have standart analyzer, and it doesn't break words to smoller tokens. I dont know how previous requests worked. But I change to wildcard query like "search*" and it's work. Thx for attantion.

Topic		Replies	Views
Search doesn't find the document that it should find when index is big? Elasticsearch	29	4730	February 17, 2020
Elasticsearch query stops working with big amount of data Elasticsearch	1	383	July 5, 2017
Extremely Large Documents: Querying and Dealing with Elasticsearch	17	3725	October 28, 2021
Weird behavior of Elasticsearch 1.0.1 Elasticsearch	7	485	July 6, 2017
Prefix query search words rather than sentence Elasticsearch	7	913	July 6, 2017

Wrong result by big index

Related topics