Elasticsearch returns 10000 rows even when only ~100 documents are relevant

bonyolult · May 3, 2023, 1:23pm

Hello, i need some help with the following issue.
I run wildcard queries on an index in Kibana Dev Tools. If i run one query at a time it returns only the relevant hits. If the queries are run paralell (2 browser tabs), both return 10000 hits.
It is true for any of the queries below, looks it is related to the demand on Elasticsearch, or to something i could'n find until now in the documentations. The same is true if the queries are run by our application via Java API.
The index has 6 shards, 2 replicas but the same happens with 1 shard and no replicas, too.
Any help is appreciated because our client is a little bit crabby for the many irrelevant hits.
Thank you, Attila

GET someindex*/_search
{
  "query": {
    "wildcard": {
      "somefield": "*sometext*"
    }
  }
} 

GET someindex*/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "somefield": "*sometext*"
          }
        }
      ]
    }
  }
}

GET someindex*/_search
{
  "query": {
    "bool": {
      "minimum_should_match": 1, 
      "should": [
        {
          "wildcard": {
            "somefield": "*sometext*"
          }
        }
      ]
    }
  }
}

Wave · May 3, 2023, 7:37pm

Hi @bonyolult,
You can add size as a query parameter to get 7 results for example with:

GET someindex*/_search?size=7

You could also add size in the body with:

GET someindex*/_search
{
  size: 7,
  "query": {

bonyolult · May 4, 2023, 7:59am

Hi Andrew, thank you for your answer! We are dealing with enermous amount of data. A query can return hundreds of thousands of hits and all those must be shown to the user paginated (that's the business req., i can't help:)). so the first thing is to count how many hits we can expect. In this case i can't use the size parameter. When search_after is used than of course we use the size setting. The other thing what i don't understand is: how come, a query returns the e.g. 103 relevant hits and on the second run (when other queries are running, too) it returns everything. AFAIK it's clearly related to the demand on Elasticsearch. When a query runs alone on the cluster it's fine, returns ony what it has to.

Wave · May 4, 2023, 1:15pm

Gotcha. Hmmm, I wouldn't expect the number of primaries or replicas to matter in the issue you are seeing. Just curious what happens when you run queries in parallel but on different clients? For example, instead of running it in two browser tabs on the same machine, trying two different machines each with one tab.

bonyolult · May 23, 2023, 8:42am

The reason was a query_string query without proper escaping, so when the user searched for "something" with wildcard, in the generated query there was
"*\"something\"*" instead of "*something*" and this caused some strange and undeterministic behaviour.

The other thing related to the queries shown in the original post: it seems to be a Kibana bug because running the same searches in Postman return the expected hits.

Wave · May 23, 2023, 1:00pm

Oh good to know. Strange about postman being different than kibana though.

system · June 20, 2023, 1:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Restriction on amount of records returned by elasticsearch Elasticsearch	8	15770	December 12, 2017
"size": 10000 in bool query not returning the result Elasticsearch	4	2852	April 24, 2017
Get all documents from index - like in Kibana (even if accualy showing some of them) Elasticsearch	2	326	July 4, 2019
Search API error Elasticsearch Elasticsearch language-clients	12	1408	June 28, 2021
[resolved] Failing parameter ("size", "sort", ...) in query Elasticsearch	3	502	July 12, 2017

Elasticsearch returns 10000 rows even when only ~100 documents are relevant

Related topics