I have problem with reindexing with ingest pipeline. Basically it seems to be working in most cases, but in some it is not and I have no clue how to debug or find a root cause.
Here is my scenario:
I have two indices
1.
PUT /assetsquick-v1
{
"mappings": {
"_source": {
"excludes": [
"systemMetadata.com:System:searchContentHTML"
]
},
"dynamic" : "false",
"systemMetadata" : {
"dynamic" : "false",
"properties" : {
"com:System:searchContentHTML" : {
"properties" : {
"textValue" : {
"type" : "text",
"fields" : {
"ft3" : {
"type" : "text",
"analyzer" : "my_analyzer3"
},
"ft5" : {
"type" : "text",
"analyzer" : "my_analyzer5"
},
"ft8" : {
"type" : "text",
"analyzer" : "my_analyzer8"
},
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
},
"std" : {
"type" : "text",
"analyzer" : "standard"
}
},
"analyzer" : "my_analyzer"
}
...
PUT /searchcontenthtml
{
"mappings": {
"dynamic": false,
"properties" : {
"asset": {"type": "keyword"},
"textValue" : {
"type" : "text",
"index": false
}
}
},
"settings": {
"index": {
"max_ngram_diff": "1",
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"number_of_shards": "1",
"number_of_replicas": "1"
}
}
}
and policy:
PUT /_enrich/policy/searchcontenthtml-policy
{
"match": {
"indices": "searchcontenthtml",
"match_field": "asset",
"enrich_fields": ["textValue"]
}
}
for pipeline:
PUT _ingest/pipeline/searchcontenthtml-pipeline
{
"description": "Adds html content from searchcontenthtml index into systemMetadata.com:System:searchContentHTML",
"processors": [
{
"enrich": {
"field": "_id",
"policy_name": "searchcontenthtml-policy",
"target_field": "systemMetadata.com:System:searchContentHTML",
"override": true
}
},
{
"remove": {
"field": "systemMetadata.com:System:searchContentHTML.asset",
"ignore_missing": true
}
}
]
}
After this reindexing:
POST /_reindex?refresh=true&timeout=60m
{
"source": {
"index": "assetsquick-v1"
},
"dest": {
"index": "assetsquick-v2",
"pipeline": "searchcontenthtml-pipeline"
}
}
For some documents I'm not able to find documents based on property systemMetadata.com:System:searchContentHTML.textValue
, when I was able to find them in assetsquick-v1
and in searchcontenthtml
. For most of the documents the search works fine.
GET assetsquick-v2/_search?_source_excludes=*&size=20
{
"query": {
"match": {
"systemMetadata.com:System:searchContentHTML.textValue": "eclipse"
}
}
}
Any idea what can be a root cause or where should I look?
The systemMetadata.com:System:searchContentHTML.textValue
contains very long texts (15 MB).