I am trying to load a table in Hive using ES-Hadoop Hive (1.6 billion rows with 380 columns) into two indices with 16 shards.
- Refresh interval is set to 60s
- Replication is set to 0
- Indices buffer size 30%
- Batch size bytes 10mb
- Batch size count 200
- Compression - best compression
Cluster: 3 master, 4 data/ingest nodes(64 GB RAM total - 32GB heap)
Erlier I had tried with 8 shards it was giving avg 5k rows/sec. Now after increasing shards it is fluctuating between 1k to 3k.
I tried using default compression, it gave minimal improvement.
Any suggestions how to debug this? Any design considerations to make it better for search and improve bulk indexing?