Hello,
Is it possible to use ML tokens generated by ELSER (Elastic Learned Sparse EncodeR) to group documents?
Let's imagine I have the following list of documents:
[
{ "name" : "Apple", price: 1234, "nameTokens" : <tokens generated by AI> },
{ "name" : "Apple Inc.", price: 654, "nameTokens" : <tokens generated by AI> },
{ "name" : "VentionCloud", price: 73, "nameTokens" : <tokens generated by AI> },
{ "name" : "vention cloud", price: 6534, "nameTokens" : <tokens generated by AI> },
{ "name" : "ventionclud inc.", price: 1434, "nameTokens" : <tokens generated by AI> }
]
The documents were created via ingest pipeline where Elser generated tokens based on 'name' field. I know that I can do semantic search using Elser:
GET my-index/_search
{
"query":{
"text_expansion":{
"ml.tokens":{
"model_id":".elser_model_1",
"model_text":"How to avoid muscle soreness after running?"
}
}
}
}
But I want it to return similar grouped documents, let's say top 7 by similarity score.
The desired output will be:
{
"Group 1" : [
{ "name" : "Apple", price: 1234, "nameTokens" : <tokens generated by AI> },
{ "name" : "Apple Inc.", price: 654, "nameTokens" : <tokens generated by AI> }
],
"Group 2": [
{ "name" : "VentionCloud", price: 73, "nameTokens" : <tokens generated by AI> },
{ "name" : "vention cloud", price: 6534, "nameTokens" : <tokens generated by AI> },
{ "name" : "ventionclud inc.", price: 1434, "nameTokens" : <tokens generated by AI> }
]
}
I guess I need to use aggregations here, but I couldn't find any info about using Elser in aggregations.