Contact
Offices

HowTo: Central and semantic logging for PostgreSQL

news rss

Alexander Sosna

Today it is no longer necessary to argue why central logging makes sense or is even necessary. Most medium-sized companies now have a central logging system or are just introducing it.

Once the infrastructure has been created, it must be used sensibly and efficiently! Especially as an infrastructure operator or service provider it is important to support different actors with different requirements in the best possible way. For example, the development department should have continuous access to all logs of its test systems. For production, however, perhaps only all error messages are needed in real time, but more when released.

Such models can be easily implemented and tested with graylog® or Kibana®. The classification and evaluation may work well in test mode or in small environments even with PostgreSQL’s default settings. However, if a large number of databases are operated or the logs are kept for a long time, it can quickly become difficult to use.

Problem definition

The log entries are entered centrally and can be used in principle. In practice however, it is difficult or impossible to promptly extract all relevant information. The search for certain entries requires full text searches with wildcards, this is no longer practical with large data sets.

For example, if you only want to give certain groups access to logs that meet certain technical criteria, such as database name, error_severity, or the same, you must use a full-text search and error-prone filters.

If a DBA wants to see all messages of a particular user, query or session, it requires a particularly complex indexing for wildcard searches. Alternatively, such requests are very slow and cannot be answered immediately.

Start point:Central logging has already been introduced, e.g. ELK-Stack or graylog®.
Target:We want to capture PostgreSQL log messages semantically and thus efficiently: classify, group and evaluate them.

 

Alternative

The alternative is to capture the log messages semantically and store the individual fields in a corresponding data structure.

If the normal stderr log is used, parsing is difficult or impossible because the individual fields cannot be recognized here. PostgreSQL also offers the possibility to produce log messages in CSV format (csvlog). All fields are separated by commas.

Excurson: The idea is not new. Some DBAs have been loading their logs directly into a PostgreSQL table for a long time. So the logs can be searched and edited with SQL and all known standard tools. The DBAs are usually very satisfied with this construct, but it is an isolated solution. An Example table for PostgreSQL 11 would look like this:
CREATE TABLE postgres_log
(
  log_time timestamp(3) with time zone,
  user_name text,
  database_name text,
  process_id integer,
  connection_from text,
  session_id text,
  session_line_num bigint,
  command_tag text,
  session_start_time timestamp with time zone,
  virtual_transaction_id text,
  transaction_id bigint,
  error_severity text,
  sql_state_code text,
  message text,
  detail text,
  hint text,
  internal_query text,
  internal_query_pos integer,
  context text,
  query text,
  query_pos integer,
  location text,
  application_name text,
  PRIMARY KEY (session_id, session_line_num)
);

Structure

Logging construction

Implementation

The csvlog can serve as a basis for the efficient filling of a central logging system. To make further processing as easy as possible and to avoid having to commit to a logging system, we translate the log from CSV to Json. It can then be fed in as required. TCP is used in the following example.

To change the log format in PostgreSQL the following options have to be adjusted:

#------------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#------------------------------------------------------------------------------
 
# - Where to Log -
 
log_destination = 'csvlog' # Valid values are combinations of
                                        # stderr, csvlog, syslog, and eventlog,
                                        # depending on platform.  csvlog
                                        # requires logging_collector to be on.
 
# This is used when logging to stderr:
logging_collector = on # Enable capturing of stderr and csvlog
                                        # into log files. Required to be on for
                                        # csvlogs.
                                        # (change requires restart)
                                        # These are only used if logging_collector is on:
log_directory = '/var/log/postgresql' # directory where log files are written,
                                        # can be absolute or relative to PGDATA

We use logstash® with the following configuration as Jinja2 template for parsing and translating in Json as well as for delivery to the logging system:

input {
        file {
                "path" => "/var/log/postgresql/*.csv"
                "sincedb_path" => "/tmp/sincedb_pgsql"
                # fix up multiple lines in log output into one entry
                codec => multiline {
                   pattern => "^%{TIMESTAMP_ISO8601}.*"
                   what => previous
                   negate => true
           }
        }
}
 
# Filter for PostgreSQL 9.0 - 11
filter {
        csv {
 columns => [ "pg_log_time", "pg_user_name", "pg_database_name",
               "pg_process_id", "pg_connection_from", "pg_session_id",
               "pg_session_line_num", "pg_command_tag",
               "pg_session_start_time", "pg_virtual_transaction_id",
               "pg_transaction_id", "pg_error_severity", "pg_sql_state_code",
               "pg_sql_message", "pg_detail", "pg_hint", "pg_internal_query",
               "pg_internal_query_pos", "pg_context", "pg_query",
               "pg_query_pos", "pg_location", "pg_application_name"  ]
        }
 
  date {
        #2014-05-22 17:02:35.069 CDT
        match => ["log_time", "YYYY-MM-dd HH:mm:ss.SSS z"]
        }
 
  mutate {
    add_field => {
      "application_name" => "postgres"
    }
  }
 
}
 
output {
        tcp {
                host => "{{ log_server }}"
                port => {{ log_port }}
                codec => "json_lines"
        }
}

It is important that the filter for translating the CSV fields is adapted to the PostgreSQL major version in use. The fields can differ from version to version. In most cases, new fields are added.

It should also be noted that logstash® has problems recognizing the timestamp independently. The concrete format should be specified here (time zone).

The following variables must be set:

{{ log_server }}Log server, in our case a graylog®
{{ log_port }}Port on the log server

 

Thus configured, the log files can be efficiently classified and searched. Authorizations are also simpler.

  • Authorization at database level => pg_database_name
  • Severity => pg_error_severity
  • Authorization for logs by certain hosts => pg_connection_from
  • Authorization for logs by certain applications => pg_application_name

Important: The normal stderr log should always be included in the central logging. After starting PostgreSQL, no normal operating messages from PostgreSQL will be displayed here, but error messages from participating processes will be displayed. For example, the stderr output of a failed archive command can be found here. This information is essential for administration.

Questions and Help

Do you have any questions or need help? Feel free to write to info@credativ.com.

This blog post originally appeared on the german credativ blog on 06.12.2016.

*All brand names and trademarks mentioned and possibly protected by third parties are subject without restriction to the provisions of the applicable trademark law and the ownership rights of their respective owners.

Add new comment

Image CAPTCHA