Pinned query limit of 100 configurable?

How can I configure the maximum # of items returned in a 'pinned query'? I'm getting the following error:

   "caused_by": {
       "type": "x_content_parse_exception",
       "reason": "Failed to build [pinned] after last required field arrived",
      "caused_by": {
             "type": "illegal_argument_exception",
              "reason": "[pinned] Max of 100 ids exceeded: 121 provided."

Hi Karl
That's a fixed limit I'm afraid. Can you say any more about your use case where this would be a requirement?

We're using this in conjunction with an external service and need to maintain the order of items being returned. We essentially have a list of products that we have to return to the client in the order we've received them. We wanted to leverage the same response format as our other search responses so this was by far the easiest way. Our items ID's are stored received as integers which makes scripting it a little more complex vs pinned queries we could just send as received.

                    "pinned": {
                        "ids": [
                            146686,
                            2269502,
                            283749,
                            2730510,
                            318627,
                            931316,
                            997379,
                            1422420,
                            1644020,
                            1943821
                        ],

simply worked :slight_smile:

Suggestions?

Are you saying there's no "organic" element at all to these queries?

For this use case, unfortunately, there is no 'organic' value. Just the order as provided in the array. Which is why the 'pinned query' seemed like a big win, until it wasn't. The expectation is that we could have as many as 1,000 items in the result set.

That may be confusing.. essentially we send it like this:

  "query": {
        "bool": {
            "must": [
                {
                    "pinned": {
                        "ids": [
                            146686,
                            2269502,
                            283749,
                            2730510,
                            318627,
                            931316,
                            997379,
                            1422420,
                            1644020,
                            1943821
                        ],
                        "organic": {
                            "terms": {
                                "product_id": [
                                    146686,
                                    2269502,
                                    283749,
                                    2730510,
                                    318627,
                                    931316,
                                    997379,
                                    1422420,
                                    1644020,
                                    1943821
                                ]
                            }
                        }
                    }
                }

OK - obviously something outside of the index knows that 146686 is first and 2269502 is second and so on.
Can't that information be recorded on the indexed docs in a "order` field and you just sort results by that field?
Pinned queries are designed to blend "natural" and "unnatural" results where they are not the same set of docs.

Totally understood about the design of Pinned queries. That's been my feedback to our team. The items we are returning are dynamically ordered by the external service and we have to maintain that order in the response.

These 'ids' are both the document '_id' and stored as a number 'product_id'

The 100 limit seems very low compared to any other ES default. Is there a major performance impact using pinned queries? Is a scripted score query a better option? _MGET?

The design of pinned queries was very focused. I'd recommend taking a look.

It's not intended for serving long lists of documents in a particular order.

If the orders change infrequently there may be a case for updating an order field as I described.
Otherwise, a simple bool query with a should array of term queries would suffice - each with a boost setting that reflects the intended order.

Note however that queries containing thousands of query terms are to be avoided (which is partially why pinned query has a limit). Lucene has a default limit of 1024 clauses. Each search clause typically requires at least one random disk seek and also allocates a memory buffer to read lists of matching documents. These are not free resources, hence the limits.

Storing an order field would be preferable if it's infrequently changed - the query should be much cheaper to run.

What about something scripted like this vs boosting... just thinking about simplifying the code? Thoughts on performance?

{
	"query": {
		"bool": {
			"must": [{
				"terms": {
					"product_id": [377497, 931316, 931317, 127127]
				}
			}]
		} },
		"sort": [{
			"_script": {
				"type": "number",
				"script": {
					"lang": "painless",
					"inline": "if(params.products.containsKey(doc['_id'][0])) { return params.products[doc['_id'][0]];} return 9999;",
					"params": {
						"products": {
							"377497": 100,
							"931316": 200,
							"931317": 300
						}
					}
				},
				"order": "asc"
			}
		}]
	,
	"from": 0,
	"size": 10
}

It's script and a doc value retrieval so likely slower than

 bool 
     should
          term
                product_id: 3777497, boost:100
          term
                product_id: 931316, boost:200

etc

1 Like

Great, thanks!

A quick follow-up note for anyone having similar needs: I think the performance sorting these would be better on this if my product_id's were stored as 'keyword' but since they are store as 'long' the performance is faster leveraging the painless script referencing the doc _id's directly.

With 400 products sorted in the query the painless script took ~22ms the should boost query ~90ms.

2 Likes

Thanks for the benchmark!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.