Why doesn't elser_model_1 generate the Vectors during index?

No matter what I try, I don't get any errors during indexing, but when I search, the ml_title and ml_description fields are not in the results. When I look at the mappings, they are there with the correct configurations.

I followed the primary documentation, and I also used Using ELSER for multiple fields - #2 by wei.wang as a reference.

I confirmed the elser_model_1 has been installed, started and running.

I confirmed the pipeline exists

I confirmed when I index the doc, I pass the _pipeline

Is there a log file, or anything that I could look for to see if there was an error ?

Here is a PHP sample of my index.

$params = [
	'index' => 'job_postings',
	'body' => [
		'mappings' => [
			'_source' => [
				'enabled' => true
			],
			'properties' => [
				'publish_id' => [
					'enabled' => false
				],
				'title' => [
					'type' => 'text',
					'index' => true
				],
				'ml_title.tokens' => [
					'type' => 'rank_features'
				],
				'description' => [
					'type' => 'text',
					'index' => true
				],
				'ml_description.tokens' => [
					'type' => 'rank_features'
				]
			]
		]
	]
]; 

try {
	$response = $client->indices()->create($params);
} catch(Exception $e) {
	$response = $e->getMessage();
}

Here is the pipeline I created

PUT _ingest/pipeline/job_postings-elser
{
  "on_failure": [
    {
      "set": {
        "description": "Record error information",
        "field": "error_information",
        "value": "Processor {{ _ingest.on_failure_processor_type }} with tag {{ _ingest.on_failure_processor_tag }} in pipeline {{ _ingest.on_failure_pipeline }} failed with message {{ _ingest.on_failure_message }}"
      }
    }
  ],
  "processors": [
    {
      "inference": {
        "model_id": ".elser_model_1",
        "target_field": "ml_title",
        "field_map" : {
          "title": "text_field"
        },
        "inference_config": {
          "text_expansion": {
            "results_field": "tokens"
          }
        }
      }
    },
    {
      "inference": {
        "model_id": ".elser_model_1",
        "target_field": "ml_description",
        "field_map": {
          "description": "text_field"
        },
        "inference_config": {
          "text_expansion": {
            "results_field": "tokens"
          }
        }
      }
    }
  ]
}

Then when I index the doc, here is the sample code.

$jData = [
	'publish_id' => '1234567890',
	'title' => 'Web Developer',
	'description' => 'You role in the company .....'
];


$params['body'][] = [
	'index' => [
		'_index' => 'job_postings',
		'_id' => '1234567890'
	],
	'_pipeline' => 'job_postings-elser'
];
$params['body'][] = $jData;

try {
	$response = $client->bulk($params);
} catch(Exception $e) {
	$response = $e->getMessage();
}

After this completes with no errors, I go to Kibana and run

GET job_postings/_search

The docs are there in the results, but ml_title and ml_description fields are no wheres in the search results.

Or does it just take a long, long time to process? If that is the case, how do I know a document is being processed and when it will be complete?

It took a while but I finally did get my fields back from the leaning model. Can someone tell me if there is a way to monitor what the pipeline is processing so we can gauge is there is bottle necks, errors or time to complete?

Hi Mike,

This API can be used to retrieve model usage information: GET _ml/trained_models/.elser_model_1/_stats . Or, through UI, in your attached picture, click ">" to expand, you will see model & pipeline details information.

I don't know much about PHP, however, you might consider using batches (size: 50 or 100) : using Bulk indexing with batches, which should help for your case.

Regards.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.