[v8.17.1] Bug in Semantic Reranking using Vertex AI?

Good day,

I am implementing a semantic ranking in v8.17.1 using Vertex AI (model semantic-ranker-default@latest) and I am facing some potentially strange behaviour.

Context

I followed Semantic reranking in Elasticsearch with retrievers.

I've created the inference endpoint google_vertex_ai_rerank. I sent the following request to verify it was working.

POST _inference/rerank/google_vertex_ai_rerank
{
  "query": "What is the capital of the USA?",
  "input": [
    "Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",
    "Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states.",
    "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",
    "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district.",
    "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",
    "North Dakota is a state in the United States. 672,591 people lived in North Dakota in the year 2010. The capital and seat of government is Bismarck."
  ]
}

Which returned the following response:

{
  "rerank": [
    {
      "index": 0, // <- Index must be 3
      "relevance_score": 0.9375,
      "text": "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district."
    },
    {
      "index": 1, // <- Wrong index
      "relevance_score": 0.142,
      "text": "Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274."
    },
    {
      "index": 2, // <- Wrong index
      "relevance_score": 0.1122,
      "text": "North Dakota is a state in the United States. 672,591 people lived in North Dakota in the year 2010. The capital and seat of government is Bismarck."
    },
    {
      "index": 3, // <- Wrong index
      "relevance_score": 0.0689,
      "text": "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas."
    },
    {
      "index": 4, // <- Wrong index
      "relevance_score": 0.0551,
      "text": "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan."
    },
    {
      "index": 5, // <- Wrong index
      "relevance_score": 0.0476,
      "text": "Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states."
    }
  ]
}

Problem

The result has ranked records with incorrect index.

I expected the following response:

{
  "rerank": [
    {
      "index": 3,
      "relevance_score": 0.9375,
      "text": "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district."
    }
    // ... Truncated 
  ]
}

As a result, when I use text_similarity_reranker the order of documents is wrong.

Should I report a problem to Google Vertex team?
Am I doing something wrong?

Thank you in advance!

Hi @ik-southpole, the bug is in Elasticsearch and fixed in version 8.18.0. Sorry about this, please upgrade to 8.18 to pick up the fix.

1 Like

Thank you for your hint! It worked!