Searching after Indexing in ElasticSearch Problem

ghasem1992 · May 30, 2016, 9:13am

I want to index 1 billion records. each record has 2 attributes (attribute1 and attribute2).
each record that has same value in attribute1 must be merge. for example, I have two record

attribute1 attribute2
1 4
1 6

my elastic document must be

{
    “attribute1”: 1
    “attribute2”: 4,6
}

due to huge amount of data, I must to read a bulk (about 1000 records) and merge them based on the above rule (in memory) and then search them in ElasticSearch and merge them with search result and then index/reindex them.
In summary I have to Search and Index per bulk respectively.
I implemented this rule but in some cases Elastic does not return all results and some documents have been indexed duplicately.
after each Index I Refresh ElasticSearch so that it be ready for next search. but in some case it doesn’t work.
my index setting is followed as:

{
"test_index": {
    "settings": {
        "index": {
            "refresh_interval": "-1",
            "translog": {
                "flush_threshold_size": "1g"
            },
            "max_result_window": "1000000",
            "creation_date": "1464577964635",
            "store": {
                "throttle": {
                    "type": "merge"
                }
            }
        },
        "number_of_replicas": "0",
        "uuid": "TZOse2tLRqGk-vHRMGc2GQ",
        "version": {
            "created": "2030199"
        },
        "warmer": {
            "enabled": "false"
        },
        "indices": {
            "memory": {
                "index_buffer_size": "40%"
            }
        },
        "number_of_shards": "5",
        "merge": {
            "policy": {
                "max_merge_size": "2g"
            }
        }
    }
}

how can I resolve this problem?
Is there any other setting to handle this situation?

warkolm · May 30, 2016, 9:53am

How does this part work exactly?

ghasem1992 · May 30, 2016, 10:21am

I used a hashmap (dictionary) and check attribute1 in all records of bulk (1000) and based on value of attribute1 grouped them. After this processing step, I will have a smaller set records (e.g. 600) that Is distinct based on attribute1.
Then I want to search these 600 records based on attributed1 (Term Query) in ElasticSearch.

Topic		Replies	Views
Elasticsearch bulk slows down after a certain amount of documents Elasticsearch	4	1364	April 24, 2020
Index then query cost 20 minutes more with half billion rows Elasticsearch	2	336	July 6, 2017
Suggestion needed on Indexing Performance Elasticsearch	1	495	July 6, 2017
Improving Bulk Indexing Elasticsearch	12	4289	July 6, 2017
Elasticsearch bulk indexing issue Elasticsearch	9	4326	March 3, 2020

Searching after Indexing in ElasticSearch Problem

Related topics