Indexing and reindexing index with single shard take too much time

Hi, Elastic Team

I cannot understand why indexing and reindexing my index takes too much time.

I'm using Elasticsearch 7.9.0 with two nodes. (Nodes have default setting; both can be master and data.)

There is only one index that is actually active for service.

Primary shard is only one and there is no replica.

The number of docs are about 37 million, but size is only 1.3 gb.

Comparing to other indices working for other services, it takes too much time when initial load is going on. (both indexing and reindexing)

Initial load takes about 3 hours and reindexing takes about 1hour.

There are 36 fields per doc, and we are using 5 analyzers with both synonym and user dictionary.

We have to reindex active index for searching to new version of index when user dictionary is updated.

Is there any methods to reduce time?

2 nodes is bad, you have no quorum with 2 nodes.

That's likely causing some significant overhead, can you share the mapping?

This means that only one node is likely to be working if you are reindexing to the same index.

Do we need to increase number of nodes?

Here is mapping.

{
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0,
        "similarity": {
            "default": {
                "type": "BM25",
                "k1": "0.4",
                "b": "0.1"
            }
        },
        "analysis": {
            "char_filter": {
                "delete_sp_char_filter": {
                    "type": "pattern_replace",
                    "pattern": "[^ㄱ-ㅎ가-힇A-Za-z0-9]",
                    "replacement": ""
                }
            },
            "filter": {
                "filter_shingle": {
                    "max_shingle_size": "2",
                    "min_shingle_size": "2",
                    "output_unigrams": "false",
                    "type": "shingle"
                },
                "synonym": {
                    "type": "synonym",
                    "synonyms": [ ... ]
                }
            },
            "analyzer": {
                "platform_nori_shingle_analyzer": {
                    "filter": [
                        "lowercase",
                        "filter_shingle"
                    ],
                    "type": "custom",
                    "tokenizer": "none_nori_tokenizer"
                },
                "platform_nori_mixed_analyzer": {
                    "filter": [
                        "lowercase"
                    ],
                    "type": "custom",
                    "tokenizer": "mixed_nori_tokenizer"
                },
                "platform_nori_discard_analyzer": {
                    "filter": [
                        "lowercase",
                        "synonym"
                    ],
                    "type": "custom",
                    "tokenizer": "discard_nori_tokenizer"
                },
                "platform_nori_none_analyzer": {
                    "filter": [
                        "lowercase",
                        "synonym"
                    ],
                    "type": "custom",
                    "tokenizer": "none_nori_tokenizer"
                },
                "platform_nori_keyword_analyzer": {
                    "char_filter": ["delete_sp_char_filter"],
                    "filter": ["lowercase"],
                    "type": "custom",
                    "tokenizer": "standard"
                }
            },
            "tokenizer": {
                "discard_nori_tokenizer": {
                    "type": "nori_tokenizer",
                    "decompound_mode": "discard",
                    "user_dictionary_rules":[ ... ]
                },
                "mixed_nori_tokenizer": {
                    "type": "nori_tokenizer",
                    "decompound_mode": "mixed",
                    "user_dictionary_rules": [ ... ]
                },
                "none_nori_tokenizer": {
                    "type": "nori_tokenizer",
                    "decompound_mode": "none",
                    "user_dictionary_rules": [ ... ]
                },
                "mixed_nori_tokenizer_no_dict": {
                    "type": "nori_tokenizer",
                    "decompound_mode": "mixed"
                },
                "none_nori_tokenizer_no_dict": {
                    "type": "nori_tokenizer",
                    "decompound_mode": "none"
                },
                "discard_nori_tokenizer_no_dict": {
                    "type": "nori_tokenizer",
                    "decompound_mode": "discard"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "search_field": {
                "type": "text",
                "analyzer": "platform_nori_mixed_analyzer"
            },
            "f1": {
                "type": "keyword"
            },
            "f2": {
                "type": "keyword"
            },
            "f3": {
                "type": "keyword"
            },
            "f4": {
                "type": "keyword"
            },
            "f5": {
                "type": "text",
                "analyzer": "platform_nori_mixed_analyzer",
                "copy_to": "search_field"
            },
            "f6": {
                "type": "text",
                "analyzer": "platform_nori_mixed_analyzer",
                "copy_to": "search_field"
            },
            "f7": {
                "type": "text",
                "analyzer": "platform_nori_mixed_analyzer",
                "copy_to": "search_field"
            },
            "f2_m_code_nm": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "text",
                        "analyzer": "platform_nori_keyword_analyzer"
                    },
                    "term": {
                        "type": "keyword"
                    }
                },
                "analyzer": "platform_nori_mixed_analyzer",
                "copy_to": "search_field"
            },
            "f8": {
                "type": "text",
                "analyzer": "platform_nori_mixed_analyzer",
                "copy_to": "search_field"
            },
            "f9": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "text",
                        "analyzer": "platform_nori_keyword_analyzer"
                    },
                    "term": {
                        "type": "keyword"
                    }
                },
                "analyzer": "platform_nori_mixed_analyzer",
                "copy_to": "search_field"
            },
            "f10": {
                "type": "text",
                "analyzer": "platform_nori_mixed_analyzer",
                "copy_to": "search_field"
            },
            "f11": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "text",
                        "analyzer": "platform_nori_keyword_analyzer"
                    },
                    "shingle": {
                        "type": "text",
                        "analyzer": "platform_nori_shingle_analyzer"
                    }
                },
                "analyzer": "platform_nori_mixed_analyzer"
            },
            "f12": {
                "type": "text",
                "analyzer": "platform_nori_mixed_analyzer",
                "copy_to": "search_field"
            },
            "f13": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "text",
                        "analyzer": "platform_nori_keyword_analyzer"
                    }
                },
                "analyzer": "platform_nori_mixed_analyzer",
                "copy_to": "search_field"
            },
            "f14": {
                "type": "keyword"
            },
            "f15": {
                "type": "keyword"
            },
            "f16": {
                "type": "keyword"
            },
            "f17": {
                "type": "keyword"
            },
            "f18": {
                "type": "keyword"
            },
            "f19": {
                "type": "keyword"
            },
            "f20": {
                "type": "keyword"
            },
            "f21": {
                "type": "keyword"
            },
            "f23": {
                "type": "keyword"
            },
            "f24": {
                "type": "long"
            },
            "f25": {
                "type": "keyword"
            },
            "f26": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "text",
                        "analyzer": "platform_nori_keyword_analyzer"
                    }
                },
                "analyzer": "platform_nori_mixed_analyzer",
                "copy_to": "search_field"
            },
            "f27": {
                "type": "keyword"
            },
            "f28": {
                "type": "boolean"
            },
            "f29": {
                "type": "keyword"
            }
        }
    }
}

we create new index, and reindex old index to new index with same setting.

Does this also mean only one node is working?

The node with the new index will be doing all the analysis, which is the heavy part of the task. If both indices are located on the same node that node will do all the work.

Have you tried slicing your reindexing request to increase parallelism?

No we haven't yet. we'll try. thanks.

If analysis tasks are too heavy for single node, is it better if there are more only data nodes?

We remove previous index after reindexing. So we thought it's too many to have more then two nodes.

It may be that the lack of parallelism means that the resources of the node is not fully utilized. A small index like that should be fine on a single node.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.