Is there any other backtrace info in the logs?
I've included our Postgres logs below.
Is there any indication in the Postgres logs that it restarted a little bit before the log message.
I don't think so, but I'm not sure. The logs I've included are where the errors start.
Is there a network proxy or load balancer between LS and the DB server?
No
2018-01-08 16:17:30 UTC:removed[8341]:LOG: server process (PID 21535) was terminated by signal 11: Segmentation fault
2018-01-08 16:17:30 UTC:removed[8341]:LOG: terminating any other active server processes
2018-01-08 16:17:30 UTC:removed(48672):removed@removed:[10278] terminating connection because of crash of another server process
2018-01-08 16:17:30 UTC:removed(48672):removed@removed:[10278]:DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-01-08 16:17:30 UTC:removed(48672):removed@removed:[10278]:HINT: In a moment you should be able to reconnect to the database and repeat your command.
repeat above logs a bunch of times
2018-01-08 16:17:30 UTC:removed[8341]:LOG: archiver process (PID 26438) exited with exit code 1
2018-01-08 16:17:30 UTC:removed(53072):removed@removed:[21450] terminating connection because of crash of another server process
2018-01-08 16:17:30 UTC:removed(53072):removed@removed:[21450]:DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-01-08 16:17:30 UTC:removed(53072):removed@removed:[21450]:HINT: In a moment you should be able to reconnect to the database and repeat your command.
then repeat below log a bunch of times
2018-01-08 16:17:30 UTC:removed(53166):removed@removed:[21537]:FATAL: the database system is in recovery mode
Do you think that this log:
2018-01-08 16:17:30 UTC:removed(53072):removed@removed:[21450]:DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
could be because I often run sudo systemctl stop logstash
and maybe the Postgres database couldn't stop the large query very quickly so the processed was killed in a bad way (or something)?