Wrong result by big index

Hello. I use elasticsearch v8.17.2. I need to do simple search by very big index 40M+ docs. But i works unexpected. My index have default analyzer without tokenization. So I can use only "phrase_prefix" for search by start of word.
Indices configuration:

mapping
/_mapping
{
    "prod_index": {
        "mappings": {
            "properties": {
                "aggs": {
                    "type": "object"
                },
                "count": {
                    "type": "integer"
                },
                "from": {
                    "type": "long"
                },
                "hash": {
                    "type": "long"
                },
                "name": {
                    "type": "text"
                },
                "query": {
                    "properties": {
                        "bool": {
                            "properties": {
                                "should": {
                                    "properties": {
                                        "match_all": {
                                            "type": "object"
                                        }
                                    }
                                }
                            }
                        }
                    }
                },
                "size": {
                    "type": "long"
                },
                "sort": {
                    "properties": {
                        "_id": {
                            "properties": {
                                "order": {
                                    "type": "text",
                                    "fields": {
                                        "keyword": {
                                            "type": "keyword",
                                            "ignore_above": 256
                                        }
                                    }
                                }
                            }
                        }
                    }
                },
                "version": {
                    "type": "boolean"
                }
            }
        }
    }
}
index settings

index settings

/_settings
{
    "people_positions": {
        "settings": {
            "index": {
                "routing": {
                    "allocation": {
                        "include": {
                            "_tier_preference": "data_content"
                        }
                    }
                },
                "number_of_shards": "1",
                "provided_name": "people_positions",
                "creation_date": "1747217800469",
                "number_of_replicas": "0",
                "uuid": "xfKYGzvRTVG9RHvIwD41tg",
                "version": {
                    "created": "8521000"
                }
            }
        }
    }
}

I debug this problem I create little text index with 1k+ docs. This simple example is worked.

Request small test index
/test_index/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "match_phrase_prefix": {
                        "name": {
                            "query": "man"
                        }
                    }
                },
                {
                    "range": {
                        "count": {
                            "gte": 100
                        }
                    }
                }
            ]
        }
    }
}
Response small test index
{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 3.6122494,
        "hits": [
            {
                "_index": "test_index",
                "_id": "1000",
                "_score": 3.6122494,
                "_source": {
                    "count": 11000,
                    "name": "IT manager",
                    "hash": null
                }
            }
        ]
    }
}

But on big production index with 40M+ docs I have no result:
[/details]

Request big index
prod_index/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "match_phrase_prefix": {
                        "name": {
                            "query": "man"
                        }
                    }
                },
                {
                    "range": {
                        "count": {
                            "gte": 100
                        }
                    }
                }
            ]
        }
    }
}
Response big index
{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

I have this document which I expect see:
prod_index/_doc/{ID}

Get document response
{
    "_index": "prod_index",
    "_id": "id",
    "_version": 434184,
    "_seq_no": 43418186,
    "_primary_term": 3,
    "found": true,
    "_source": {
        "count": 434184,
        "name": "IT manager",
        "hash": null
    }
}

What is the mapping for the index? If you show this, people can recreate the example and not try to guess it based on your description.

If the field you are searching only is mappend as keyword you probably need to use a potentially expensive wildcard query.

1 Like

Good advice. I added it. My field name is "text" without analyzer and tokenazer. It is simple transform to lovercase when I add document or search in index.

That does not make sense. If it is mapped as text it does have an analyzer. If you want someone to help on this I would recommend showing the exact mapping to avoid wasting peoples time.

You are right. I added index mapping and settings in main description. Field "name" have analyzer " Standard analyzer" because I don't set other.

If it works for the small test index but not the large one I assume the mappings are different?

Thank you for your attantion to this question. I appreciate it. :heart:
Mappings are not different. But okey. I try to reproduce it in more simply way and I get it.
Reproduce steps:

  1. Create simple index:
Curl request
curl --location --request PUT 'localhost:9001/test_index_v3' \
--header 'Content-Type: application/json' \
--data '{
  "mappings": {
    "properties": {
      "count": {
        "type": "integer"
      },
      "hash": {
        "type": "long"
      },
      "name": {
        "type": "text"
      }
    }
  }
}'
  1. Put 101 documents. Then request to search
Request
curl --location --request GET 'localhost:30090/test_index_v3/_search' \
--header 'Content-Type: application/json' \
--data '{
    "query": {
        "bool": {
            "must": [
                {
                    "match_phrase_prefix": {
                        "name": {
                            "query": "man"
                        }
                    }
                },
                {
                    "range": {
                        "count": {
                            "gte": 30
                        }
                    }
                }
            ]
        }
    }
}'

And received expected response

Response
{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 3.6686912,
        "hits": [
            {
                "_index": "test_index_v3",
                "_id": "1000",
                "_score": 3.6686912,
                "_source": {
                    "count": 2000,
                    "name": "IT manager",
                    "hash": null
                }
            }
        ]
    }
}
  1. But after that I increase document count to 1015, ther same request returned empty hits. I tried it in postman and I am sure that request is equal. I have guess, may be ,it is because first I search by match_phrase_prefix, receive some batch of result and then engine filter it by range. I am not sure.

Notice, document is existing in index while all testing. I check it by get /test_index_v3/_doc/1000
Document:

get by id response
{
    "_index": "test_index_v3",
    "_id": "1000",
    "_version": 14,
    "_seq_no": 1202,
    "_primary_term": 1,
    "found": true,
    "_source": {
        "count": 14000,
        "name": "IT manager",
        "hash": null
    }
}

You have a high version number. Are you repeatedly updating documents during the test or when you load new data?

If you are making changes and updating/rewriting documents I would recommend you run the refresh API before querying. The fact that you are able to get a document by ID does not mean it is available for search as it may still only reside in the transaction log.

After some time I solved this issue. My index have standart analyzer, and it doesn't break words to smoller tokens. I dont know how previous requests worked. But I change to wildcard query like "search*" and it's work. Thx for attantion.