Collapse with multi value

Hello,
i tried to collapse multivalued field and its doesnt work
e.g:

"field": [id1, id2, id3] 

i'm asking if there's a workaround to solve this issue.
any suggestions please !!

Hi @arcturuscom

Just to help us understand.

Can you show what the desired input and result you want? (What do you want it to look like)

What did you try that failed?

How or what did the fail result in / error?

this is what i get

      {
        "shard" : 0,
        "index" : "whitespace-autres-1",
        "node" : "-pvtzg-jSm2tXGNwadcRlw",
        "reason" : {
          "type" : "illegal_state_exception",
          "reason" : "failed to collapse 120067, the collapse field must be single valued"
        }
      },

my quer is simple :

POST legan_test/_search?explain
{
  "from": 0,
  "size": 10,
  "collapse": {
    "field": "combiningIds.keyword"
  }
}

the combingIds mapping is :

            "combiningIds" : {
              "type" : "text",
              "norms" : false,
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },

this field is multivalue

            "combiningIds" : [
              "1234567",
              "09747",
              "119999888
            ]

we process some rdf files, each document contains a list of combiningIds we retrieve them and we index them in Elasticsearch
we consider that 2 documents should be collapsed if there's atleast one element shared between them.

Ok now I understand... Exactly as your title stated :slight_smile:

Unfortunately as I understand it and as the documentation here states

The field used for collapsing must be a single valued keyword or numeric field with doc_values activated

Collapse does not work with arrays of keyword values. It can not chose a value out of the array.

That is exactly the error you are getting, which make sense because your field combiningIds.keyword is an array.

Thanks @stephenb
Do you have any suggestion or workaround?
I'm thinking about each element in the table on a separate document but i don't think that's a good idea in terms of disk space
Any hints please

Can you provide 5 or 6 sample docs and the desired results perhaps we can take a look...

The array might work with a different approach and yes de-normalization can be an option but there could be many docs.

How many documents do you have?

I am more interested in what the result you are looking for, even if collapse "worked" I am unclear what the expected result was.

de-normalization can be an option but there could be many docs.

this is true. we have millions of documents

as i mentioned, my main use case is to collapse documents that they are in relationship between them instead of flat them in search result. we want to keep places for some other relevant documents.

giving an example, a document could have 5 version, and i want to display only the last version ( i know how to do this ) and collapse the old version instead of make them available in the search result. we want to avoid this because we want to increase our relevancy and make a place to the other relevant document instead of displaying 5 different version of the same document.

let consider in this example that combiningIds field like a versions field that contains something like

versions: [1,2,3,4,5]

Apologies but I do not have a solution for you ... I cannot tell what you want the output to be.

Perhaps someone else will.

I think without some sample simplified docs and what you want the output to be it will be difficult for anyone to suggest a solution.


nothing interesting in my document
they contains some fields like title, text, category, date and combiningIds.

we consider that a document is in relationship with another if both sharing atleast one combiningId.

the screenshot above visualize the relationship between the documents.
let say that i want to collapse document id:2 and 3 with id:1 or document 2 with 4

Could you accomplish what you want with Elasticsearch Graph API?

Thnaks @Jason_Slater it's sounds interessting

if you have some data, such as this example:

{"some_id":123,"some_field":["id1","id2","id3"]}
{"some_id":456,"some_field":["id2","id3","id4"]}
{"some_id":789,"some_field":["id3","id4","id5"]}

where you want to find connections by the values in "some_field"

I ingested this sample as NLJSON into "some-index" and created an index pattern for it.

In Graph, if I query for some_id: 123, the connections are apparent:

For reference, my settings are here:

Clicking on "Inspect" will reveal the API query and response behind the graph.

Hope this helps, or at least provides some food for thought.

@Jason_Slater very interesting, i'm curious to know if it's possible to display reponse in format :

hits : [
  {
      some_id: 123,
      inner_hits: [
          {some_id: 456}, {some_id: 789}
     ]  
  }
]

i want to display somehting similar to this :

PS: document 456, 789 shouldn't be appeared on the other search cards as they are already collapsed in the first

Just a random thought. If you create a hash based on multi-value field and then apply collapse on that hashed-field?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.