Hi,
i am using elasticsearch ingest pipeline for language identification. Furthermore, i would like to apply to each language-field a language analyzer. To my knowledge there is no ingest pipeline for langauge analyzers so i created an index that will use my pipeline and apply the language analyzer from the mapping.
here is my index:
PUT my_index
{
"settings": {
"index.default_pipeline" : "my_sample_pipeline",
"analysis" : {
"analyzer": {
"my_analyzer" : {
"type" : "custom",
"tokenizer" : "standard",
"filter" : "my_apostrophe"
}
},
"filter" : {
"my_apostrophe" : {
"type" : "asciifolding",
"perserve_original": true
}
}
}
},
"mappings": {
"dynamic": true,
"properties": {
"description": {
"analyzer" : "my_analyzer",
"type" : "text",
"fields" : {
"en_analyzer": {
"type": "text",
"analyzer": "english"
},
"de_analyzer": {
"type": "text",
"analyzer": "simple"
},
"pt_analyzer": {
"type": "text",
"analyzer": "portuguese"
},
"fr_analyzer": {
"type": "text",
"analyzer": "french"
},
"zh_analyzer": {
"type": "text",
"analyzer": "smartcn"
}
}
}
}
}
}
and this is my pipeline:
// PUT _ingest/pipeline/my_sample_pipeline
{
"processors" : [
{
"inference" : {
"model_id" : "lang_ident_model_1",
"inference_config": {
"classification" : {
"num_top_classes" : 1
}
},
"field_map" : {
"description" : "text"
},
"target_field" : "_ml.lang_ident"
}
},
{
"rename" : {
"field" : "description",
"target_field" : "description.raw"
}
},
{
"rename" : {
"field" : "_ml.lang_ident.predicted_value",
"target_field": "description.language_processed"
}
},
{
"script" : {
"lang" : "painless",
"source" : "ctx.description.supported = (['de', 'en', 'fr', 'pt', 'zh'].contains(ctx.description.language_processed))"
}
},
{
"set" : {
"if" : "ctx.description.supported",
"field": "description.{{description.language_processed}}",
"value" : "{{description.raw}}",
"override" : false
}
},
{
"set": {
"if" : "ctx.description.language_processed == 'en'",
"field" : "description.{{description.language_processed}}",
"value" : "{{description.en_analyzer}"
}
}
]
}
after storing the description in its language-field using SCRIPT processor i am trying with the last SET processor to apply "description.en_analyzer" on the description i stored in en_field. Is this way of proceeding possible? the output of the last SET processor is an empty string field. Since the field en_analyzer in the index mapping is empty. my target is to apply to my "classified descriptions" the appropriate analyzers. Any ideas how to proceed?