I have been trying to add a condition on my multi processor pipeline.
{
"4modelprocessor_peopleagg": {
"processors": [
{
"pipeline": {
"name": "ner_pipeline_peopleagg"
}
},
{
"pipeline": {
"name": "elser_pipeline_peopleagg"
}
}
]
}
}
I should run the above elser or ner pipelines only when resumecontent or aboutme fields are available.
Is there a way we can specify a condition to run the pipeline if one of the fields are available.
Can someone please help me with the reference on this.
Thanks for that. But I'm working on the inference processor. As per the documentation I can see it for set and drop processors. How can add my condition for the below inference processor :
{
"ner_pipeline": {
"processors": [
{
"inference": {
"model_id": "dslim__bert-base-ner",
"target_field": "ml.ner_resume_content",
"field_map": {
"resumecontent": "text_field",
"ignore_missing": "true"
}
}
},
{
"inference": {
"model_id": "dslim__bert-base-ner",
"target_field": "ml.ner_about_me",
"field_map": {
"aboutme": "text_field",
"ignore_missing": "true"
}
}
},
{
"script": {
"lang": "painless",
"if": "return ctx['ml']['ner_resume_content'].containsKey('entities') && ctx['ml']['ner_about_me'].containsKey('entities')",
"source": "Map resumeContentTags = new HashMap(); for (item in ctx['ml']['ner_resume_content']['entities']) { if (!resumeContentTags.containsKey(item.class_name)) resumeContentTags[item.class_name] = new HashSet(); resumeContentTags[item.class_name].add(item.entity); } ctx['resume_content_tags'] = resumeContentTags; Map aboutMeTags = new HashMap(); for (item in ctx['ml']['ner_about_me']['entities']) { if (!aboutMeTags.containsKey(item.class_name)) aboutMeTags[item.class_name] = new HashSet(); aboutMeTags[item.class_name].add(item.entity); } ctx['about_me_tags'] = aboutMeTags;"
}
}
],
"on_failure": [
{
"set": {
"description": "Index document to 'failed-'",
"field": "_index",
"value": "failed-{{{ _index }}}"
}
},
{
"set": {
"description": "Set error message",
"field": "ingest.failure",
"value": "{{_ingest.on_failure_message}}"
}
}
]
}
}
I should run the above elser or ner pipelines only when resumecontent or aboutme fields are available.
Is there a way we can specify a condition to run the pipeline if one of the fields are available. Can you please help me with how the code looks like with condition?
Thanks for the above condition. I have tried to add my condition like this with "ignore_missing" : true and also an if condition like below :
{
"processors": [
{
"inference": {
"field_map": {
"resumecontent": "text_field"
//"ignore_missing": "true"
},
"if": "ctx.resumecontent != null || ctx.aboutme != null",
"model_id": "dslim__bert-base-ner",
"target_field": "ml.ner_resume_content"
}
I have been trying to run the above pipeline with reindexing which has only aboutme field and I don't see anything that is being created on the final index.
Can you please let me know if I'm missing on something?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.