HUGE Transaction Log

I'm running ES 2.0.0. During a 48 hour test I have transaction logs that are almost 60Gb and growing. It seems to not be flushing. I could try a manual flush but don't understand why I would need to do that. My understanding is flushing happens automatically whenever the translog is too big (default 512Mb) or after 30 minutes.

We're running a single node. Our index is named "report" I see five folders in esearch/data/iTE/nodes/0/indices/report.

root@ite60-henry-perf02:/data/esearch/data/iTE/nodes/0/indices/report# ls
0  1  2  3  4  _state

root@ite60-henry-perf02:/data/esearch/data/iTE/nodes/0/indices/report# du -h .
1.4G    ./3/index
8.0K    ./3/_state
14G     ./3/translog
16G     ./3
1.4G    ./2/index
8.0K    ./2/_state
14G     ./2/translog
16G     ./2
1.4G    ./1/index
8.0K    ./1/_state
14G     ./1/translog
16G     ./1
1.5G    ./0/index
8.0K    ./0/_state
189M    ./0/translog
1.6G    ./0
8.0K    ./_state
1.5G    ./4/index
8.0K    ./4/_state
14G     ./4/translog
16G     ./4
63G     .

Inside the translog folders I see one huge log file. For example:

root@ite60-henry-perf02:/data/esearch/data/iTE/nodes/0/indices/report# ls -l 1/translog
total 14523980
-rwxrwxrwx 1 esearch esearch          20 Feb  1 05:30 translog-15.ckp
-rwxrwxrwx 1 esearch esearch          43 Nov 21 05:22 translog-15.tlog
-rwxr-xr-x 1 esearch esearch          20 Feb  6 04:03 translog-16.ckp
-rwxrwxrwx 1 esearch esearch          43 Feb  1 05:30 translog-16.tlog
-rwxr-xr-x 1 esearch esearch          20 Feb  6 05:30 translog-17.ckp
-rw-r--r-- 1 esearch esearch          43 Feb  6 04:03 translog-17.tlog
-rw-r--r-- 1 esearch esearch 14872489859 Feb  7 19:26 translog-18.tlog
-rwxrwxrwx 1 esearch esearch          20 Nov 15 18:39 translog-1.ckp
-rwxrwxrwx 1 esearch esearch          20 Nov 15 19:08 translog-2.ckp
-rwxrwxrwx 1 esearch esearch          20 Feb  7 19:26 translog.ckp

Over time those big files keeps growing.

Appreciate any pointers on what to look at or what to set.


I flushed (POST /report/_flush). Took 14s, cleared 56Gb and rotated translog (now have translog-19, which again continues to grow.

Looks like this is related to 15814 and fixed by 15830

What is best way to pick up that fix?

Upgrade. 2.0.0 is a long time ago in Elasticsearch terms. It is the first release of the last major version.

I understand. We're in a development cycle. Don't want to upgrade just on the hope stuff is fixed. This is the first time we noticed the problem. But it looks like upgrading to 2.2+ will fix, right?

Yes, 2.2+ would fix it. If you are upgrading you might want to get to a more modern version though.

Already tested, ES 5 breaks us. We use Jest and they seem to have issues, but might be that we're behind there too. For now a customer situation so ES 5 not an option. (Unlike the entire rest of the world it is still necessary for us to deliver software.)

