HUGE Transaction Log

I'm running ES 2.0.0. During a 48 hour test I have transaction logs that are almost 60Gb and growing. It seems to not be flushing. I could try a manual flush but don't understand why I would need to do that. My understanding is flushing happens automatically whenever the translog is too big (default 512Mb) or after 30 minutes.

We're running a single node. Our index is named "report" I see five folders in esearch/data/iTE/nodes/0/indices/report.

root@ite60-henry-perf02:/data/esearch/data/iTE/nodes/0/indices/report# ls
0  1  2  3  4  _state

root@ite60-henry-perf02:/data/esearch/data/iTE/nodes/0/indices/report# du -h .
1.4G    ./3/index
8.0K    ./3/_state
14G     ./3/translog
16G     ./3
1.4G    ./2/index
8.0K    ./2/_state
14G     ./2/translog
16G     ./2
1.4G    ./1/index
8.0K    ./1/_state
14G     ./1/translog
16G     ./1
1.5G    ./0/index
8.0K    ./0/_state
189M    ./0/translog
1.6G    ./0
8.0K    ./_state
1.5G    ./4/index
8.0K    ./4/_state
14G     ./4/translog
16G     ./4
63G     .

Inside the translog folders I see one huge log file. For example:

root@ite60-henry-perf02:/data/esearch/data/iTE/nodes/0/indices/report# ls -l 1/translog
total 14523980
-rwxrwxrwx 1 esearch esearch          20 Feb  1 05:30 translog-15.ckp
-rwxrwxrwx 1 esearch esearch          43 Nov 21 05:22 translog-15.tlog
-rwxr-xr-x 1 esearch esearch          20 Feb  6 04:03 translog-16.ckp
-rwxrwxrwx 1 esearch esearch          43 Feb  1 05:30 translog-16.tlog
-rwxr-xr-x 1 esearch esearch          20 Feb  6 05:30 translog-17.ckp
-rw-r--r-- 1 esearch esearch          43 Feb  6 04:03 translog-17.tlog
-rw-r--r-- 1 esearch esearch 14872489859 Feb  7 19:26 translog-18.tlog
-rwxrwxrwx 1 esearch esearch          20 Nov 15 18:39 translog-1.ckp
-rwxrwxrwx 1 esearch esearch          20 Nov 15 19:08 translog-2.ckp
-rwxrwxrwx 1 esearch esearch          20 Feb  7 19:26 translog.ckp

Over time those big files keeps growing.

Appreciate any pointers on what to look at or what to set.

Thanks.

I flushed (POST /report/_flush). Took 14s, cleared 56Gb and rotated translog (now have translog-19, which again continues to grow.

Looks like this is related to 15814 and fixed by 15830

What is best way to pick up that fix?

Upgrade. 2.0.0 is a long time ago in Elasticsearch terms. It is the first release of the last major version.

I understand. We're in a development cycle. Don't want to upgrade just on the hope stuff is fixed. This is the first time we noticed the problem. But it looks like upgrading to 2.2+ will fix, right?

Yes, 2.2+ would fix it. If you are upgrading you might want to get to a more modern version though.

Already tested, ES 5 breaks us. We use Jest and they seem to have issues, but might be that we're behind there too. For now a customer situation so ES 5 not an option. (Unlike the entire rest of the world it is still necessary for us to deliver software.)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.