Performance when fetching ids for large result set


(Kshitij Gupta) #1

Hi,

Let's say I am running a query (only with filters) which has 100,000 matching documents.

When I run the query with search_type=count, it runs pretty quickly (~10 ms) which implies that the filtering is quite fast.

GET /denorm/_search?search_type=count
{
  "query": {
    "filtered": {
      "filter": {
        some filter here
        }
      }
    }
  }
}

But when I run the same query and fetch the ids of the matching results (not fetching any fields), the query is quite slow (~2 s)

GET /denorm/_search
    {
      "size": 100000
      "query": {
        "filtered": {
          "filter": {
            some filter here
            }
          }
        }
      },
     "fields": []
    }

I would like to understand the reason for this and if there is any way to speed up the fetching of ids.

Is the time taken in mapping lucene docids to _uid? How does that work? Is there some cache to speed this up? Is the fielddata cache used in this?

I am using 1.4.2 right now. Also note that I am using my app specific ids.

Thanks,
Kshitij


(Ivan Brusic) #2

Are you changing the size parameter to return more than the default?

Ivan


(Kshitij Gupta) #3

Yes. I need ids for all the matching documents.

GET /denorm/_search
    {
      "size": 100000
      "query": {
        "filtered": {
          "filter": {
            some filter here
            }
          }
        }
      },
     "fields": []
    }

Thanks,
Kshitij


(system) #4