Sort by length of an array

PUT wc
{
  "mappings": {
    "properties": {
      "comments": {
        "type": "text",
        "fields": {
          "word_count": {
            "type": "token_count",
            "analyzer": "standard"
          }
        }
      }
    }
  }
}

PUT wc/_doc/1
{
  "comments" : ["hi", "hello", "how r u"]
}

PUT wc/_doc/2
{
  "comments" : ["hi", "hello", "how r u" , "fine"]
}

GET wc/_search
{
  "sort": [
    {
      "comments.word_count": {
        "order": "desc"
      }
    }
  ]
}

this isn't giving correct results. how should i troubleshoot this?

I was looking at this today to try to find a way to bypass this problem.
I tried this but this is the same problem:

DELETE wc
PUT wc
{
  "mappings": {
    "properties": {
      "wc": {
        "type": "token_count",
        "analyzer": "standard"
      },
      "comments": {
        "type": "text",
        "copy_to": "wc"
      }
    }
  }
}
PUT wc/_doc/1
{
  "comments" : ["hi", "hello", "how r u"]
}
PUT wc/_doc/2
{
  "comments" : ["hi", "hello", "how r u" , "fine"]
}
GET wc/_search
{
  "sort": [
    {
      "wc": {
        "order": "desc"
      }
    }
  ]
}

I found this old issue which did not get a lot of traction:

I wonder if this is doable with an ingest pipeline which would build a text field containing all the text from the array... I was expecting copy_to to do that but apparently it does not...

@jpountz I wonder if this is a bug or a feature. IMHO it looks like a bug. Or may be we should add an option for arrays? As here we don't want the number of tokens in the biggest string of the array but the count of all the tokens within the array.

It's a feature to me. Out of curiosity, what is the high-level use-case, I wonder whether this is the right tool for the job.

We are developing an application called data-catalog, which holds all information about the data that our department is dealing with, something like supply chain data.
In this application, we are allowing users to like or dislike(social recommendations) content of a document.
So, while a user searching for a document, we would like to sort the list of matched documents in number of likes in descending order, followed by number of dislikes in ascending order.

We are storing who liked or disliked in the array of strings(userid). Hence, we arrived at the problem of sorting the result set by length of an array.

Hope this clarifies the requirement.

Has anybody got a chance to look into this??

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.