Need assist with Painless scripting

I'm trying to make a script that sorts text that contains text + numbers in numbering order.
Example, we have the the data:
"Box 1", "Box 2", "Box 3", "Box 10", "Box 20"
By using normal alphabetical ordering then it would sort like this:
"Box 1", "Box 10", "Box 2", "Box 20", "Box 3"

So when I have few hundreds boxes and I'm getting pages of 50 items at a time I would like to get the first 50 boxes in the first page of results.

I know how this sorting is done in c-sharp (and react), there is an example function here that does this: www(dot)dotnetperls(dot)com/alphanumeric-sorting
But in painless I just can't get it to work.

Has anyone done this before or can anyone help me with this?

Is there maybe some Painless forum where I could ask about this?

I might add that I'm new to both elasticsearch and painless scripting.

I got a little help from elastic support where they guided me in the right direction.

so I managed to create a script that basically adds zeros to any number in a text so it will be a 10 character long number (2 will be 0000000002) and then orders the text alphabetically.
But there is one thing I would like to change. Instead of sending the script with the query each time I'm querying I would like to either include the sort query in the index mapping or save it as a script on the server and then call it each time I send the query.
I tried both ways, if I include the script in the mapping then I'm not allowed to insert data into the index, it tells me I can't index data in a field that contains a script.
If I save the script to the server and change it a little bit so it takes in parameters (the field should be the parameter), then I see that the sort data is the field name and not the actual data I want it to work with.

Here is an example of what works:

DELETE my_index
PUT my_index
{
  "mappings": {
    "properties": {
      "original_field": {
        "type": "keyword"
      }
    }
  }
}
PUT /my_index/_bulk
{ "index": { "_id": 1 }}
{ "original_field": "Box 10"}
{ "index": { "_id": 2 }}
{ "original_field": "Box 2"}
{ "index": { "_id": 3 }}
{ "original_field": "Box 1"}
{ "index": { "_id": 4 }}
{ "original_field": "Box 20"}
{ "index": { "_id": 5 }}
{ "original_field": "Box 500 Shelf-7 Rack-2"}
{ "index": { "_id": 6 }}
{ "original_field": "Box 20 Shelf-20 Rack, 80"}
// Get all, no sorting
POST /my_index/_search
{
  "query": {
    "match_all": {}
  }
}
/// =================== The query inclulding the script ===========================
POST /my_index/_search
{
  "size": 20,
  "query": {
    "match_all": {}
  },
  "sort": {
    "_script": {
      "type": "string",
      "script": {
        "source": """
          if(doc['original_field'].value != null){
            String inputString = doc['original_field'].value;
            def retText = '';
            Matcher matcher1 = /\d+|\D+/.matcher(inputString);
            while (matcher1.find()) {
              def nItem = matcher1.group();
              Matcher matcher2 = /\d+/.matcher(nItem);
              boolean containsDigits = matcher2.find();
              if (containsDigits){
                def count = nItem.length();
                def i = count;
                while(i < 10){
                  nItem = '0' + nItem;
                  i = i+1;
                }
              }
              retText = retText + nItem;
            }
            retText.trim();
            }
        """,
        "lang": "painless"
      },
      "order": "asc"
    }
  },
  "script_fields": {
    "original_field": {
      "script": {
        "source": "doc['original_field'].value"
      }
    }
  }
}

This works perfectly. I get results like this:

{
        "_index": "my_index",
        "_id": "3",
        "_score": null,
        "fields": {
          "original_field": [
            "Box 1"
          ]
        },
        "sort": [
          "Box 0000000001"
        ]
      },
      {
        "_index": "my_index",
        "_id": "2",
        "_score": null,
        "fields": {
          "original_field": [
            "Box 2"
          ]
        },
        "sort": [
          "Box 0000000002"
        ]
      },

And here is an example of how I save the script to the server and then use it in a query.

// Save the script to the server first. Takes in the parameter 'field'
POST _scripts/sortAlphaNum
{
  "script": {
    "lang": "painless",
    "source": """
          if(params.field != null){
            String inputString = params.field;
            def retText = '';
            Matcher matcher1 = /\d+|\D+/.matcher(inputString);
            while (matcher1.find()) {
              def nItem = matcher1.group();
              Matcher matcher2 = /\d+/.matcher(nItem);
              boolean containsDigits = matcher2.find();
              if (containsDigits){
                def count = nItem.length();
                def i = count;
                while(i < 10){
                  nItem = '0' + nItem;
                  i = i+1;
                }
              }
              retText = retText + nItem;
            }
            retText.trim();
            }
        """
  }
}

// Query that calls the script with the parameter 'field'. Problem with this is that "doc['original_field'].value" will be the data that is sorted by. Not the actual value insite that field.
POST /my_index/_search
{
  "size": 20,
  "query": {
    "match_all": {}
  },
  "sort": {
    "_script": {
      "type": "string",
      "script": {
        "id": "sortAlphaNum",
        "params": {
          "field": "doc['original_field'].value"
        }
      },
      "order": "asc"
    }
  },
  "script_fields": {
    "original_field": {
      "script": {
        "source": "doc['original_field'].value"
      }
    }
  }
}

Example of output:

{
        "_index": "my_index",
        "_id": "1",
        "_score": null,
        "fields": {
          "original_field": [
            "Box 10"
          ]
        },
        "sort": [
          "doc['original_field'].value"
        ]
      },
      {
        "_index": "my_index",
        "_id": "2",
        "_score": null,
        "fields": {
          "original_field": [
            "Box 2"
          ]
        },
        "sort": [
          "doc['original_field'].value"
        ]
      },

There you see that the value for "sort" is the string for the field name, not the actual value.
Any help with saving the script to the server and passing the field data to it would be appreciated.

Never mind. I just needed to write this down and post it here to figure this out.
I now have a script that I saved to the server and I call it when I want to sort values by AlphaNumSort order.
Final version looks like this:

// Save the script to the server first. Takes in the parameter 'field'
POST _scripts/sortAlphaNum
{
  "script": {
    "lang": "painless",
    "source": """
          if(params.field != null){
            String inputString = doc[params.field].value;
            def retText = '';
            Matcher matcher1 = /\d+|\D+/.matcher(inputString);
            while (matcher1.find()) {
              def nItem = matcher1.group();
              Matcher matcher2 = /\d+/.matcher(nItem);
              boolean containsDigits = matcher2.find();
              if (containsDigits){
                def count = nItem.length();
                def i = count;
                while(i < 10){
                  nItem = '0' + nItem;
                  i = i+1;
                }
              }
              retText = retText + nItem;
            }
            retText = retText.trim();
            return retText;
            }
        """
  }
}

// Query useing the script
POST /my_index/_search
{
  "size": 20,
  "query": {
    "match_all": {}
  },
  "sort": {
    "_script": {
      "type": "string",
      "script": {
        "id": "sortAlphaNum",
        "params": {
          "field": "original_field"
        }
      },
      "order": "asc"
    }
  }
}

It looks like I just needed to pass the field name to the script, not the value.
It's finally working for me, hope it can be of any use for someone else.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.