[Howto] PostgreSQL and Linux Memory Management
The OOM-Killer can cause nasty surprises on machines with a heavy memory load; processes are cancelled or terminated without warning. Fortunately, this behaviour can be adjusted with some clever kernel tweaks. Administrators of Linux machines with a very high RAM-Usage are sometimes faced with a terrifying scenario: the Linux OOM-Killer (OOM = Out Of Memory). In situations such as a crashed PostgreSQL instance, the following entry can typically be found in the server log:
Out of Memory: Killed process PID (Prozessname)
Why is this?
Virtual Memory and Overcommit
Virtual Memory used by Linux can be allocated in a number of ways: malloc(), mmap(), Swap, Shared Memory, to mention some examples. It is possible to overcommit virtual memory by allocating more than is actually available in the system. If this happens, a so-called "OOM-Condition" occurs; that is, your system no longer has any available space in the virtual memory area and cannot allocate any more. This is when the OOM-Killer is activated - and does what its name suggests: kills any processes which meet certain conditions in order to free memory. If you have an environment where servers are running PostgreSQL in parallel with other memory-intensive processes on the same machine, it's likely that the OOM-Killer will kill certain PostgreSQL processes. Due to the amount of allocated shared memory and the memory usage of each backend, the OOM-Killer will target PostgreSQL by preference since it counts the complete addressed shared memory area of all backends into summary. The amount of committed memory of your system at a given time can be examined with the /proc-Filesystem:
$ grep Commit /proc/meminfo CommitLimit: 376176 kB Committed_AS: 265476 kB
This example shows the current amount of committed memory at 265476 kB (Committed_AS). Is this equal or even larger than the amount of Committed_AS the OOM-Killer is likely to be woken up. However, the kernel provides some interfaces to adjust the behaviour of the OOM-Killer and Overcommit with regard to PostgreSQL installations.
Turn off Overcommit
A radical method is to turn overcommit off entirely, although this is only recommended on systems dedicated to PostgreSQL. The overcommit feature can be configured within three categories with the following kernel parameter:
vm.overcommit_memory = 0
This can hold three different kinds of categories:
- 0: Allow a careful strategy of overcommitting memory: small and reasonable amounts of overcommitting allocations are allowed, but heavy and wild allocations will be denied. In this mode, root can allocate more space than unprivileged users. This is also the kernel default setting.
- 1: Allow overcommit without any constraints
- 2: Turn off overcommit. The effective allocatable memory space cannot be larger than swap + a configurable percentage of physical RAM.
The fraction of physical RAM used by category 2 is defined by the parameter:
vm.overcommit_ratio = 50
While vm.overcommit_memory=1 is useful when tuning certain applications, the categories 0 or 2 are the best ones to use most of the time. If you turn off overcommit with vm.overcommit_memory=2, a process will get an "out of memory"-Exception (depending of vm_overcommit_ratio) if allocating memory when no more free space is available. Depending on the distribution you are using, we recommend that you save those settings in the configuration file /etc/sysctl.conf to ensure that they are activated on server reboot.
$ echo "vm.overcommit_memory=2 >> /etc/sysctl.conf $ echo "vm.overcommit_ratio=60 >> /etc/sysctl.conf $ sysctl -p /etc/sysctl.conf
Changes to those parameters are activated immediately. You can recheck this by consulting /proc/meminfo:
$ grep Commit /proc/meminfo CommitLimit: 401440 kB Committed_AS: 266456 kB
The machine has 249848 kB of swap and 252656 kB physical RAM. According to the formula swap + vm.overcommit_ratio * RAM this results in a CommitLimit of 401440 kB
Configure OOM-Killer per process
Where PostgreSQL is running without dedicated server hardware and in parallel with memory-intensive middleware (e.g. JBoss- or Tomcat-Installations), most admins would prefer to be able to control the OOM-Killer on a per-process basis and allow overcommitting of memory allocations. Since kernel 2.6.1, Linux has been providing an interface for tuning the OOM-Score of a process, which will in turn increase or decrease the affinity of the process to be killed when running in an OOM-Situation. This interface allows a very flexible configuration of processes in such environments regarding their memory requirements. The interface is exposed by the /proc-Filesystem, for example here on a PostgreSQL-Installation on Debian:
$ cat /proc/$(cat /var/run/postgresql/8.4-main.pid)/oom_adj 0
Values allowed range from -17 to +15, a negative value decreases, while a positive value increases the likelihood of being killed by the OOM-Killer. -17 is a special value and turns killing the process in an OOM-Situation off. The settings are inherited from parent to child processes; in PostgreSQL you'll have to set this one to the PostgreSQL master process:
$ echo -17 >> /proc/$(cat /var/run/postgresql/8.4-main.pid)/oom_adj $ psql -q postgres test=# SELECT pg_backend_pid(); pg_backend_pid ---------------- 3429 (1 line) test=# + Stopped psql -q test $ cat /proc/3429/oom_adj -17
The disadvantage of this method is that all child processes will now be excluded from the OOM-Killer, which is not generally what DBAs prefer. For example, where you want to protect the PostgreSQL system processes (like background writer oder autovacuum) from being killed by the OOM-Killer, but still kill ordinary database connections when running out of memory. To set the OOM-Score you need to have a privileged user, so the best way to implement this setting is to put it into your PostgreSQL start script.
Enhancements in PostgreSQL 9.0
PostgreSQL 9.0 will have additional support for the pictured /proc-Interface. On one hand PostgreSQL 9.0 will come with a new Linux start script, which supports setting the oom_adj value before starting up PostgreSQL; on the other hand it is possible to build PostgreSQL with the special C-Macro LINUX_OOM_ADJ defined, which will allow DBAs to limit the inheritance of the OOM-Score to backend childs as shown in this example:
$ ./configure CC="ccache gcc" CFLAGS="-DLINUX_OOM_ADJ=0"
This method will save the PostgreSQL system process but will allow the OOM-Killer to kill database backend processes running amok.
An alternative solution is available by an additional kernel patch. This extends the existing /proc-Filesystem with a list of process names which should be excluded from the OOM-Killer. However, this patch is an unoffical extension to the Linux kernel and you may have to maintain your own builds of Linux kernels. In addition, it is not nearly as flexible as adjusting the OOM-Score and process names are not useful for uniquely identifying processes (e.g. Java- or Perlbased processes).
The Linuxkernel provides a comprehensive interface to adjust processes regarding their memory usage and the OOM-Killer. The most flexible method is the introduced /proc-Filesystem with the oom_adj-Interface. PostgreSQL 9.0 will have additional support for this interface. Dedicated PostgreSQL-Systems can be configured to avoid overcommit at all, but will need a deeper understanding of the number of memory resources the database system demands and the requirements of the VM of the kernel.