Modify sorting with painless script

We would like to store products including their categories and category positions in a document. One product can be in several categories at different positions. Our thought was to use something like this:

index.catalog.id = ['cat1', 'cat2', 'cat3']
index.catalog.position = ['00005|cat1', '00014|cat2', '00072|cat3']

For index.catalog.id we can filter and for index.catalog.position we can use for sorting. But if we use

['term' => ['index.catalog.id' => 'cat2']]
['sort' => 'index.catalog.positon']

the product would be at the fifth position instead of the fourteenth. Therefore, we would need a way to skip the index.catalog.position values for sorting, that are not relevant because they are for different categories.

Question: How can we do that with a painless script or is there a better way to achive the same in a different way?

The answer is: yes, you can do it with a Painless script. Something like this should work:

GET my_index/_search
{
  "query": {
    "term": {
      "index.catalog.id": "cat2"
    }
  },
  "sort": {
    "_script": {
      "type": "string",
      "script": {
        "lang": "painless",
        "source": "for (value in doc['index.catalog.position.keyword'].values) { StringTokenizer str = new StringTokenizer(value, '|'); def returnValue = str.nextElement(); if (params.query_term.equals(str.nextElement())) { return returnValue } } return 0",
        "params": {
          "query_term": "cat2"
        }
      },
      "order": "asc"
    }
  }
}

(Note that it requires you to also pass in your search terms as a parameter to the script)

Now, the question is: should you do it with a script? I would say: if you can model your data in such a way that you do not need scripting, then that's probably a better solution. Especially on larger dataset, running a lot of scripts can put a strain on your cluster.

For example, maybe you could model your data using nested types:

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "index.catalog": {
          "type": "nested",
          "properties": {
            "id": {
              "type": "keyword"
            },
            "position": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

Now, you could index your documents like this:

PUT my_index/_doc/1
{
  "index.catalog": [
    {
      "id": "cat1",
      "position": "0005|cat"
    },
    {
      "id": "cat2",
      "position": "0014|cat2"
    },
    {
      "id": "cat3",
      "position": "0072|cat3"
    }
  ]
}

Or even:

PUT my_index/_doc/1
{
  "index.catalog": [
    {
      "id": "cat1",
      "position": "0005"
    },
    {
      "id": "cat2",
      "position": "0014"
    },
    {
      "id": "cat3",
      "position": "0072"
    }
  ]
}

Now, you will be able to query and sort like this, without the need for scripting:

GET my_index/_search
{
  "query": {
    "nested": {
      "path": "index.catalog",
      "query": {
        "term": {
          "index.catalog.id": "cat2"
        }
      }
    }
  },
  "sort": [
    {
      "index.catalog.position": {
        "order": "asc",
        "nested": {
          "path": "index.catalog",
          "filter": {
            "term": {
              "index.catalog.id": "cat2"
            }
          }
        }
      }
    }
  ]
}
1 Like

Thank you for your detailed explanations! :slight_smile:
So essence is that the penality of nested types is less than the time needed to run the script to filter out all non-matching sorting keywords, so we will use nested types instead as you recommend

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.