I'm new to Elasticsearch and I've been experimenting with doing a bulk load of data from one of our custom apps. I'm doing this on an Ubuntu VM (4GB mem) on my laptop.
I'm using fluentd's elasticsearch plugin (which uses the bulk API) to load the data from CSV into ES. I previously tried with logstash but fluentd was performing significantly faster and also was a little more robust with some of the errors I was hitting.
There are exactly 100,000 records of the format:
1,"2011-10-11 12:21:01",ds1,1790_cPop,"Census of the United States, 1790: Population Data for US",ds1.dt1,"Total Population","1790_cPop: Total Population"
This data goes back to 2011 and I've got fluentd configured to create a new index for each day. I have data for every single day from then until present, so when a run is done, I should have something like 1,300 indexes (one for each day since late 2011), each with some documents in it.
WIth default fluentd-elasticsearch-plugin settings, when the load was done ES would have all of the indexes, but something like 90k documents - it had dropped 10% of them - and so some of the indexes had 0 documents in them. I figured this out because Kibana kept complaining about failed shards that it couldn't search (because they did not have the @timestamp it was looking for, since they were empty). I repeated this many times, and always there would be a significant number of dropped documents, but not the same number or same days each time.
When I finally thought to add some more conservative buffering settings to fluentd (specifically, reducing chunk sizes to 2MB), I was able to consistently get all 100k documents into ES. Larger chunk sizes (e.g. 4MB) would drop documents. I also observed a fair number of timeouts where fluentd would have to retry communicating with ES with the larger chunk sizes.
Now, I'm obviously using an underpowered ES environment, so I'm not surprised ES was having trouble keeping up. The real issue is that it took me forever to figure out that this was the problem, because the elasticsearch log wasn't hinting at any of these issues. I'm confused as to why these events were being dropped silently. Was this simply the case of a buffer somewhere overfilling such that messages were partially dropped before ES ever saw them? Is there any way to adjust the logs to make it easier to discover when ES rejects an event?
Basically, I am a newbie wanting to understand more about why this was happening and why my fix worked. Thanks!