I have mappings and processors in line with other examples here. However, during the index generation, I get an error like:
{"date":"2024-05-16 14:53:43,147", "type":"ERROR", "class":"com.ibm.dp.datastore.utilities.ESUtils$2", "thread": "elasticsearch-rest-client-0-thread-2",
"message": "Failed to index document due to this error: ErrorCause: {"type":"document_parsing_exception","reason":"[1:785] failed to parse field [path_expansion.tokens] of type [rank_feature] in document with id '0DF15005E4038DEEBDF13B1774B01E18B9894D2C_001'.
Preview of field's value:
'{cluster=0.15190549, string=2.8398185, data=0.53949, software=0.003122813, section=0.17318511, interface=0.38456964, inventory=0.34014818, malaysia=0.021001812, directory=0.16043624, platform=0.31209457, network=0.36547637, button=0.2672434, split=0.2064711, java=0.025861437, strings=1.767182, web=0.030292396, api=1.0258077, portal=1.5379529, website=0.18131517, sap=2.3315399, portals=0.24848863, module=0.18718521, tree=0.21967274, thread=0.5139853, list=0.12154329, message=0.06096609, m=0.10591241, script=0.06531332, help=1.3187134, component=0.08407042, field=0.16718304, serial=0.21964681, information=1.394739E-5, category=0.09415125, support=0.78008264, ~=0.34376627, status=0.16336815, customer=0.06774526}'",
"caused_by":{"type":"x_content_parse_exception",
"reason":"[1:30] Current token (START_OBJECT) not numeric, can not use numeric value accessors\n at [Source: ...
The "object" referenced appears to be a set of keyword=value pairs, not a proper JSON object (e.g., {cluster=0.15190549, string=2.8398185, ...
), but this is generated from the text field I've passed in.
Examples of mapping and processors:
{
"_source": {
"enabled": "true"
},
"dynamic": "false",
"properties": {
"url": {
"type": "text"
},
"title": {
"type": "text",
"analyzer": "text_keep_stopwords",
"search_analyzer": "text_drop_stopwords",
"term_vector": "with_positions_offsets",
"index_options": "offsets",
"store": "true"
},
"path": {
"type": "text",
"term_vector": "with_positions_offsets",
"analyzer": "text_keep_stopwords",
"search_analyzer": "text_drop_stopwords",
"index_options": "offsets",
"store": "true"
},
"path_expansion.tokens": {
"type": "rank_feature"
}
}
}
processors:
"processors": [
{
"inference": {
"model_id": ".elser_model_1",
"target_field": "path_expansion",
"field_map": {
"path": "text_field"
},
"inference_config": {
"text_expansion": {
"results_field": "tokens"
}
}
}
}
]
}
The path variable is a string with words delimited by " ~ " like:
"title ~ heading1 ~ heading2"