Text_embedding configured for model but rejected for query

wnmills3 · April 10, 2024, 5:00pm

I have issued a query like below:

{
    "query": {
        "bool": {
            "should": [
                {
                    "text_embedding": {
                        "path_embedding.tokens": {
                            "model_id": "intfloat__multilingual-e5-base",
                            "model_text": "What is the purpose of the EHS Location Hierarchy?"
                        }
                    }
                },
                {
                    "text_embedding": {
                        "passage_embeddding.tokens": {
                            "model_id": "intfloat__multilingual-e5-base",
                            "model_text": "What is the purpose of the EHS Location Hierarchy?"
                        }
                    }
                }
            ]
        }
    }
}

and it gets rejected with this error:

{
    "error": {
        "root_cause": [
            {
                "type": "parsing_exception",
                "reason": "unknown query [text_embedding]",
                "line": 6,
                "col": 39
            }
        ],
        "type": "x_content_parse_exception",
        "reason": "[6:39] [bool] failed to parse field [should]",
        "caused_by": {
            "type": "parsing_exception",
            "reason": "unknown query [text_embedding]",
            "line": 6,
            "col": 39,
            "caused_by": {
                "type": "named_object_not_found_exception",
                "reason": "[6:39] unknown field [text_embedding]"
            }
        }
    },
    "status": 400
}

However, viewing the documents, they have both the passage_embedding tokens array and path_embedding tokens array. The model shows text_embedding as a type:

and the pipeline processors have generated these tokens using:

{
  "processors": [
    {
      "inference": {
        "model_id": "intfloat__multilingual-e5-base",
        "target_field": "passage_embedding",
        "field_map": {
          "passage": "text_field"
        },
        "inference_config": {
          "text_embedding": {
            "results_field": "tokens"
          }
        }
      }
    },
    {
      "inference": {
        "model_id": "intfloat__multilingual-e5-base",
        "target_field": "path_embedding",
        "field_map": {
          "path": "text_field"
        },
        "inference_config": {
          "text_embedding": {
            "results_field": "tokens"
          }
        }
      }
    }
  ]
}

Why would the query be rejected?

I was following the example here.

Note: if I use the incorrect text_expansion the error returns claiming the model is configured for text_embedding:

{
    "error": {
        "root_cause": [
            {
                "type": "status_exception",
                "reason": "Trained model [intfloat__multilingual-e5-base] is configured for task [text_embedding] but called with task [text_expansion]"
            }
        ],
        "type": "status_exception",
        "reason": "Trained model [intfloat__multilingual-e5-base] is configured for task [text_embedding] but called with task [text_expansion]",
        "caused_by": {
            "type": "status_exception",
            "reason": "Trained model [intfloat__multilingual-e5-base] is configured for task [text_embedding] but called with task [text_expansion]"
        }
    },
    "status": 403
}

wnmills3 · April 12, 2024, 5:32pm

I found lots of errors with the sequence I was building the index so the mappings were FUBAR. No need to review/answer this request. Thank you.

javigsg · April 30, 2024, 11:53am

Hi @wnmills3 I got the same problem. I am trying to create the embeddings from a web crawl, but when I am trying to run an inference I got this error:

BadRequestError(400, 'parsing_exception', 'unknown query [text_embedding]')

I am seeing the embeddings in the documents tab, but I am not able to run an inference. Any advice? Thanks

wnmills3 · May 1, 2024, 1:10pm

I initially had separate calls to set up mappings and processors, etc. Because they were not done in the proper order the index was not created correctly. I settled on making a single call with all the details supplied at once when creating the index:

    void createIndexAndIngestData(String indexName, Path jsonlFilePath,
        String langFamily) {
        String stage = "creating index " + indexName + " for " + jsonlFilePath;
        try {
            if (_indexNames.contains(indexName) == false) {
                ESUtils.createReplaceIndex(_client, indexName, _removeIndex, langFamily);
                _indexNames.add(indexName);
                if (_thumbsucker) {
                    System.out.println("Created index: " + indexName);
                }
                ESUtils.setPipeline(_client, indexName,
                    langFamily, _logger);
            }
            stage = "Ingesting data for index " + indexName;
            System.out.println(stage);
            ingestPassagesFromFile(jsonlFilePath.toString(), indexName);
        } catch (Exception e) {
            _logger.error(stage, e);
        }
    }

Which makes these calls:

            IndexSettings is = getSettingsRequest(indexName, langFamily);
            TypeMapping mr = getMappingsRequest(indexName, langFamily);
            CreateIndexResponse response = client.indices()
                .create(cir -> cir.timeout(Time.of(t -> t.time("60s")))
                    .index(indexName).settings(is).mappings(mr));
            result = response.acknowledged();

where I read the settings from a json file:

    static public IndexSettings getSettingsRequest(String indexName,
        String langFamily) throws FileNotFoundException {
        String filename = "." + File.separator + "properties" + File.separator
            + langFamily + "_settings.json";
        // check existence
        File test = new File(filename);
        if (!test.exists()) {
            String defaultFilename = "." + File.separator + "properties"
                + File.separator + "en_settings.json";
            test = new File(defaultFilename);
            if (!test.exists()) {
                throw new FileNotFoundException("Can not find \"" + filename
                    + "\" nor default language \"" + defaultFilename + "\"");
            }
            filename = defaultFilename;
        }
        final FileReader file = new FileReader(new File(filename));

        IndexSettings req;

        req = IndexSettings.of(b -> b.withJson(file));

        return req;
    }

and the mappings from a json file:

    static public TypeMapping getMappingsRequest(String indexName,
        String langFamily) throws FileNotFoundException {
        String filename = "." + File.separator + "properties" + File.separator
            + langFamily + "_mappings.json";
        // check existence
        File test = new File(filename);
        if (!test.exists()) {
            String defaultFilename = "." + File.separator + "properties"
                + File.separator + "en_mappings.json";
            test = new File(defaultFilename);
            if (!test.exists()) {
                throw new FileNotFoundException("Can not find \"" + filename
                    + "\" nor default language \"" + defaultFilename + "\"");
            }
            filename = defaultFilename;
        }
        final FileReader file = new FileReader(new File(filename));

        TypeMapping req;

        req = TypeMapping.of(b -> b.withJson(file));

        return req;
    }

and setting the processors from a json file:

    static public void setPipeline(ElasticsearchClient client, String indexName,
        String langFamily, Logger logger) throws Exception {
        String filename = "." + File.separator + "properties" + File.separator
            + langFamily + "_processors.json";
        List<String> lines = DPUtils.loadTextFile(filename);
        StringBuffer sb = new StringBuffer();
        for (String line : lines) {
            sb.append(line + "\n");
        }
        String rdpPipeline = sb.toString();
        Request request = new Request("PUT", "/_ingest/pipeline/rdp_pipeline");
        request.setJsonEntity(rdpPipeline);
        RestClientTransport restClientTransport = (RestClientTransport) client
            ._transport();
        Response response = restClientTransport.restClient()
            .performRequest(request);
        if (response.getStatusLine().getStatusCode() == 200) {
            ObjectMapper objectMapper = new ObjectMapper();
            String acknowledged = EntityUtils.toString(response.getEntity());
            AcknowledgedResponse ak_response = objectMapper
                .readValue(acknowledged, CreatePipelineResponse.class);
            if (!ak_response.acknowledged()) {
                logger.error("Creating pipeline returned false.");
            }
        } else {
            logger.error("Could not set rdp_pipeline due to request status "
                + response.getStatusLine().getStatusCode());
        }
    }

dkyle · May 1, 2024, 2:08pm

text_embedding is an option to KNN search, see k-nearest neighbor (kNN) search | Elasticsearch Guide [8.13] | Elastic

The text_expansion query is for sparse vectors like those created by the ELSER model. Use knn search and the text_embedding option for dense vectors as created by the multilingual-e5-base model.

Here is a useful guide to semantic search in Elastic it covers the ELSER model and text embedding models: Semantic search | Elasticsearch Guide [8.13] | Elastic

javigsg · May 3, 2024, 4:36pm

how can I solve this issue?

"[dense_vector] fields cannot be indexed if they're within [nested] mappings"

dkyle · May 5, 2024, 7:48am

What version of Elasticsearch are you using? Older versions do not support nested dense_vector fields.

Here's a great blog that shows you how to set up nested dense vector field mappings and chunk large documents

javigsg · May 5, 2024, 6:34pm

I am using 8.10, is nested dense_vector supported? Yes that's the documentation that I am following, thanks!

Christian_Dahlqvist · May 5, 2024, 6:45pm

Based on the docs it seems it was added in Elasticsearch 8.11, so I would recommend upgrading to the latest version.

Topic		Replies	Views
Ingestion Failure with ML inference for E5 model Elasticsearch elastic-stack-machine-learning , painless , ingest-pipeline	3	274	June 4, 2024
Failure: [lang_ident_neural_network] model could not find non-null numerical array named [embedding_vector] Elasticsearch elastic-stack-machine-learning , ingest-pipeline	2	321	October 15, 2021
Getting Error on creating embedding via code Elasticsearch ilm-index-lifecycle-management	1	10	July 22, 2024
Failed to parse ES Query Elasticsearch	1	587	July 6, 2017
Document(s) failed to index \| mapper_parsing_exception Elasticsearch	3	1081	November 13, 2022

Text_embedding configured for model but rejected for query

Related topics