Filter on _search performed with text_expansion is not working

I have an Elasticsearch index ingested with an inference pipeline using ELSER. While performing the search I would like to filter and show all the data where my field "Country" has a certain value "XYZ".
My query below works:

GET service-index/_search
{
  "query": {
    "bool": {
      "must": [{
        "text_expansion": {
          "ml.tokens": {
            "model_id": ".elser_model_2_linux-x86_64",
            "model_text": "Food tour"
          }
        }
      }]
    }
  },
  "fields": [
    "ServiceDescription"
  ], 
  "_source": false
}

But the below query does not work.

GET service-index/_search
{
  "query": {
    "bool": {
      "must": [{
        "text_expansion": {
          "ml.tokens": {
            "model_id": ".elser_model_2_linux-x86_64",
            "model_text": "Food tour"
          }
        }
      }],
      "filter": [{
        "term": {
          "Locale": "Paris"
        }
      }]
    }
  },
  "fields": [
    "ServiceDescription"
  ], 
  "_source": false
}

I have checked the index and Locale has a value of "Paris" for the documents that are returned. I also looked up question asked here: How to filter _search performed with text_expansion

Hi @Ankur_Garg, Welcome to the community.

I am not sure what your issue is as you did not show the data and mapping but perhaps this will help...

The Docs... Here and Here

These work as expected

DELETE discuss-test-elser

PUT discuss-test-elser
{
  "mappings": {
    "properties": {
      "content_embedding": {
        "type": "sparse_vector"
      },
      "content": {
        "type": "text"
      },
      "region": {
        "type": "keyword"
      }
    }
  }
}

PUT _ingest/pipeline/elser-v2-test
{
  "processors": [
    {
      "inference": {
        "model_id": ".elser_model_2",
        "input_output": [ 
          {
            "input_field": "content",
            "output_field": "content_embedding"
          }
        ]
      }
    }
  ]
}

POST discuss-test-elser/_doc?pipeline=elser-v2-test
{
  "content" : "I had a great tour of the Notre Dame Church",
  "region" : "Paris"
}
  
POST discuss-test-elser/_doc?pipeline=elser-v2-test
{
  "content" : "I had a great tour of the El Domo",
  "region" : "Florence"
}  

POST discuss-test-elser/_doc?pipeline=elser-v2-test
{
  "content" : "I had dinner in Rome",
  "region" : "Rome"
} 

GET discuss-test-elser/_search
{
  "_source": [
    "content",
    "region"
  ]
}

# Search with no filter
GET discuss-test-elser/_search
{
  "_source": [
    "content",
    "region"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "text_expansion": {
            "content_embedding": {
              "model_id": ".elser_model_2",
              "model_text": "great tour"
            }
          }
        }
      ]
    }
  }
}  

# Search with Filter performant, Filter is not scored 
GET discuss-test-elser/_search
{
  "_source": [
    "content",
    "region"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "text_expansion": {
            "content_embedding": {
              "model_id": ".elser_model_2",
              "model_text": "great tour"
            }
          }
        }
      ],
      "filter": [
        {
          "term": {
            "region": "Paris"
          }
        }
      ]
    }
  }
}  

# Search with Must with 2 queries works slightly less performant 
GET discuss-test-elser/_search
{
  "_source": [
    "content",
    "region"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "text_expansion": {
            "content_embedding": {
              "model_id": ".elser_model_2",
              "model_text": "great tour"
            }
          }
        },
        {
          "term": {
            "region": {
              "value": "Paris"
            }
          }
        }
      ]
    }
  }
}  
  
# Search with Should to show scoring
GET discuss-test-elser/_search
{
  "_source": [
    "content",
    "region"
  ],
  "query": {
    "bool": {
      "should": [
        {
          "text_expansion": {
            "content_embedding": {
              "model_id": ".elser_model_2",
              "model_text": "great tour"
            }
          }
        },
        {
          "term": {
            "region": {
              "value": "Paris"
            }
          }
        }
      ]
    }
  }
}  

Thanks @stephenb , let me take a look through this and reindex. Does it have to be "sparse vector" or can it be "rank features" too ?

sparse_vector is the newer type, and was added with ELSER in mind. You can definitely still use rank_features, but that field type will also allow other types of queries against it than semantic text search, so it may be better to leverage sparse_vector.

Thanks @Sean_Story and @stephenb . I think my filters weren't working because I was using rank_features for my initial indexing. Looks like the sparse_vector did the trick.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.