Why doesn't elser_model_1 generate the Vectors during index?

mbastarache · October 27, 2023, 1:02pm

No matter what I try, I don't get any errors during indexing, but when I search, the ml_title and ml_description fields are not in the results. When I look at the mappings, they are there with the correct configurations.

I followed the primary documentation, and I also used Using ELSER for multiple fields - #2 by wei.wang as a reference.

I confirmed the elser_model_1 has been installed, started and running.

I confirmed the pipeline exists

I confirmed when I index the doc, I pass the _pipeline

Is there a log file, or anything that I could look for to see if there was an error ?

Here is a PHP sample of my index.

$params = [
	'index' => 'job_postings',
	'body' => [
		'mappings' => [
			'_source' => [
				'enabled' => true
			],
			'properties' => [
				'publish_id' => [
					'enabled' => false
				],
				'title' => [
					'type' => 'text',
					'index' => true
				],
				'ml_title.tokens' => [
					'type' => 'rank_features'
				],
				'description' => [
					'type' => 'text',
					'index' => true
				],
				'ml_description.tokens' => [
					'type' => 'rank_features'
				]
			]
		]
	]
]; 

try {
	$response = $client->indices()->create($params);
} catch(Exception $e) {
	$response = $e->getMessage();
}

Here is the pipeline I created

PUT _ingest/pipeline/job_postings-elser
{
  "on_failure": [
    {
      "set": {
        "description": "Record error information",
        "field": "error_information",
        "value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message {{ _ingest.on_failure_message }}"
      }
    }
  ],
  "processors": [
    {
      "inference": {
        "model_id": ".elser_model_1",
        "target_field": "ml_title",
        "field_map" : {
          "title": "text_field"
        },
        "inference_config": {
          "text_expansion": {
            "results_field": "tokens"
          }
        }
      }
    },
    {
      "inference": {
        "model_id": ".elser_model_1",
        "target_field": "ml_description",
        "field_map": {
          "description": "text_field"
        },
        "inference_config": {
          "text_expansion": {
            "results_field": "tokens"
          }
        }
      }
    }
  ]
}

Then when I index the doc, here is the sample code.

$jData = [
	'publish_id' => '1234567890',
	'title' => 'Web Developer',
	'description' => 'You role in the company .....'
];


$params['body'][] = [
	'index' => [
		'_index' => 'job_postings',
		'_id' => '1234567890'
	],
	'_pipeline' => 'job_postings-elser'
];
$params['body'][] = $jData;

try {
	$response = $client->bulk($params);
} catch(Exception $e) {
	$response = $e->getMessage();
}

After this completes with no errors, I go to Kibana and run

GET job_postings/_search

The docs are there in the results, but ml_title and ml_description fields are no wheres in the search results.

Or does it just take a long, long time to process? If that is the case, how do I know a document is being processed and when it will be complete?

mbastarache · October 27, 2023, 1:18pm

mbastarache · October 27, 2023, 3:02pm

It took a while but I finally did get my fields back from the leaning model. Can someone tell me if there is a way to monitor what the pipeline is processing so we can gauge is there is bottle necks, errors or time to complete?

wei.wang · October 27, 2023, 8:29pm

Hi Mike,

This API can be used to retrieve model usage information: GET _ml/trained_models/.elser_model_1/_stats . Or, through UI, in your attached picture, click ">" to expand, you will see model & pipeline details information.

I don't know much about PHP, however, you might consider using batches (size: 50 or 100) : using Bulk indexing with batches, which should help for your case.

Regards.

system · November 24, 2023, 8:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using ELSER for multiple fields Elasticsearch elastic-stack-machine-learning	2	825	September 21, 2023
Error while reindexing with inference processor and ELSER v1 in version 8.10 Elastic Search elastic-app-search	2	302	March 20, 2024
Getting Error on creating embedding via code Elasticsearch ilm-index-lifecycle-management	1	10	July 22, 2024
Can not deploy ELSER to my ML cluster Elasticsearch	0	6	December 20, 2024
Ingest pipeline ELSER embedding fails with more than 1 ML node Elasticsearch elastic-stack-machine-learning , ingest-pipeline	2	193	February 22, 2024

Why doesn't elser_model_1 generate the Vectors during index?

Related topics