How to find the percentage for any query success?

I want to write an aggregation which returns the percentage of hits for any query
Let say I have a query Q which returns me 20 docs out of the 100 present in the target index.
Then that aggregation should return me 20% when I add the query to it.

I should clarify that I am using a span_near query which works fine, but I want to know how many hits it gets in percentage and it really shouldn't be this hard to figure out.

I found this question which discusses a similar use-case which is getting the percentage of success. However, they have used term filter and I need span_near which is not available under filters but only under query.

I also came across this question, the solution which describes my use case exactly. Unfortunately it does not have an answer.

here is my working span_near query -

GET /transform_test6_4/_search?size=0
{
  "query": {
    "span_near": {
      "clauses": [
        {
          "span_term": {
            "eventFlow": "click_input"
          }
        },
        {
          "span_term": {
            "eventFlow": "idle"
          }
        }
      ],
      "slop": 120,
      "in_order": true
    }
  }
}

You can wrap your query in a filter aggregation. You can then use a bucket_script aggregation to access the total number of matching documents via _count.

Wrap all of this inside a filters aggregation that uses a match_all query to get the total number of documents, again via _count.

You can now calculate the percentage of matching documents by dividing the two counts and multiplying by 100. Something like this should work:

GET transform_test6_4/_search
{
  "size": 0,
  "aggs": {
    "all_docs_filter": {
      "filters": {
        "filters": {
          "all_docs": {
            "match_all": {}
          }
        }
      },
      "aggs": {
        "matching_docs": {
          "filter": {
            "span_near": {
              "clauses": [
                {
                  "span_term": {
                    "eventFlow": "click_input"
                  }
                },
                {
                  "span_term": {
                    "eventFlow": "idle"
                  }
                }
              ],
              "slop": 120,
              "in_order": true
            }
          }
        },
        "match_percentage": {
          "bucket_script": {
            "buckets_path": {
              "matching_doc_count": "matching_docs._count",
              "all_doc_count": "_count"
            },
            "script": "params.matching_doc_count / params.all_doc_count * 100"
          }
        }
      }
    }
  }
}

I notice you've set size to 0, so I'm assuming you don't care about the actual hits. If you do, you can add a top_hits aggregation to the matching_docs filter agg.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.