I have an Elasticsearch index ingested with an inference pipeline using ELSER. While performing the search I would like to filter and show all the data where my field "Country" has a certain value "XYZ".
My query below works:
GET service-index/_search
{
"query": {
"bool": {
"must": [{
"text_expansion": {
"ml.tokens": {
"model_id": ".elser_model_2_linux-x86_64",
"model_text": "Food tour"
}
}
}]
}
},
"fields": [
"ServiceDescription"
],
"_source": false
}
But the below query does not work.
GET service-index/_search
{
"query": {
"bool": {
"must": [{
"text_expansion": {
"ml.tokens": {
"model_id": ".elser_model_2_linux-x86_64",
"model_text": "Food tour"
}
}
}],
"filter": [{
"term": {
"Locale": "Paris"
}
}]
}
},
"fields": [
"ServiceDescription"
],
"_source": false
}
I have checked the index and Locale has a value of "Paris" for the documents that are returned. I also looked up question asked here: How to filter _search performed with text_expansion
stephenb
(Stephen Brown)
March 11, 2024, 2:05am
2
Hi @Ankur_Garg , Welcome to the community.
I am not sure what your issue is as you did not show the data and mapping but perhaps this will help...
The Docs... Here and Here
These work as expected
DELETE discuss-test-elser
PUT discuss-test-elser
{
"mappings": {
"properties": {
"content_embedding": {
"type": "sparse_vector"
},
"content": {
"type": "text"
},
"region": {
"type": "keyword"
}
}
}
}
PUT _ingest/pipeline/elser-v2-test
{
"processors": [
{
"inference": {
"model_id": ".elser_model_2",
"input_output": [
{
"input_field": "content",
"output_field": "content_embedding"
}
]
}
}
]
}
POST discuss-test-elser/_doc?pipeline=elser-v2-test
{
"content" : "I had a great tour of the Notre Dame Church",
"region" : "Paris"
}
POST discuss-test-elser/_doc?pipeline=elser-v2-test
{
"content" : "I had a great tour of the El Domo",
"region" : "Florence"
}
POST discuss-test-elser/_doc?pipeline=elser-v2-test
{
"content" : "I had dinner in Rome",
"region" : "Rome"
}
GET discuss-test-elser/_search
{
"_source": [
"content",
"region"
]
}
# Search with no filter
GET discuss-test-elser/_search
{
"_source": [
"content",
"region"
],
"query": {
"bool": {
"must": [
{
"text_expansion": {
"content_embedding": {
"model_id": ".elser_model_2",
"model_text": "great tour"
}
}
}
]
}
}
}
# Search with Filter performant, Filter is not scored
GET discuss-test-elser/_search
{
"_source": [
"content",
"region"
],
"query": {
"bool": {
"must": [
{
"text_expansion": {
"content_embedding": {
"model_id": ".elser_model_2",
"model_text": "great tour"
}
}
}
],
"filter": [
{
"term": {
"region": "Paris"
}
}
]
}
}
}
# Search with Must with 2 queries works slightly less performant
GET discuss-test-elser/_search
{
"_source": [
"content",
"region"
],
"query": {
"bool": {
"must": [
{
"text_expansion": {
"content_embedding": {
"model_id": ".elser_model_2",
"model_text": "great tour"
}
}
},
{
"term": {
"region": {
"value": "Paris"
}
}
}
]
}
}
}
# Search with Should to show scoring
GET discuss-test-elser/_search
{
"_source": [
"content",
"region"
],
"query": {
"bool": {
"should": [
{
"text_expansion": {
"content_embedding": {
"model_id": ".elser_model_2",
"model_text": "great tour"
}
}
},
{
"term": {
"region": {
"value": "Paris"
}
}
}
]
}
}
}
Thanks @stephenb , let me take a look through this and reindex. Does it have to be "sparse vector" or can it be "rank features" too ?
sparse_vector
is the newer type, and was added with ELSER in mind. You can definitely still use rank_features
, but that field type will also allow other types of queries against it than semantic text search, so it may be better to leverage sparse_vector
.
Thanks @Sean_Story and @stephenb . I think my filters weren't working because I was using rank_features
for my initial indexing. Looks like the sparse_vector
did the trick.
system
(system)
Closed
April 8, 2024, 1:30pm
6
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.