If you are a web developer, you need to refer to various log files to debug and monitor your application. There are many components working together in a modern web site: Web server, database server, PHP interpreter, cache server to name the most common and important ones in a PHP powered dynamic website. To be able to monitor the health and status of your site it is very important to track your logs. Each of the components of your website generates its own log. In this post, we will show how to configure the system for log generation and sending all of the logs to ZettaLogs log management service for central log monitoring.

We assume a web site operated using an Nginx web server with php5-fpm and MySQL 5.7 as the database on an Ubuntu 14.04 operating system. We also assume that Rsyslog configuration for sending logs to ZettaLogs has been already done on the server as explained in Sending Logs section and working without problem.

Configure Rsyslog for Nginx Logs

We will configure Rsyslog for following Nginx access and error log files as described in Sending Logs section.

Create a file /etc/rsyslog.d/24-zetta-files.conf with the following content.


$ModLoad imfile
$InputFilePollInterval 10 
$PrivDropToGroup adm
$WorkDirectory /var/spool/rsyslog


## File example.com_access.log
$InputFileName /var/log/nginx/example.com_access.log
$InputFileTag example.com_access            # set an identifying tag
$InputFileStateFile stat-example.com_access # state file saved under /var/spool/rsyslog (must be a unique file name)
$InputFileSeverity info                     # Set the severity value
$InputFilePersistStateInterval 10000
$InputRunFileMonitor
#############################################

## File example.com_error.log
$InputFileName /var/log/nginx/example.com_error.log
$InputFileTag example.com_error            # set an identifying tag
$InputFileStateFile stat-example.com_error # state file saved under /var/spool/rsyslog (must be a unique file name)
$InputFileSeverity info                    # Set the severity value
$InputFilePersistStateInterval 10000
$InputRunFileMonitor
#############################################

Note that the $InputFileTag will show as app_name field in ZettaLogs. Therefore, set this to the name of the file or to any other unique identifier that makes you identify the source of this log easily.

With the above setting in place, rsyslog will now follow the files we want. However, it will also output the log lines read from these files to /var/log/syslog. We do not want this because they are already saved in their own log files. Thus add the following filtering configuration to /etc/rsyslog.d/25-zetta.conf to discard the log line from the files we follow right after they are forwarded to ZettaLogs.

...

# Send logs
*.* @@data.zettalogs.com:6514;ZettaFormat
# Discard logs read from files after sending them to ZettaLogs
if $programname == 'example.com_access' then stop
if $programname == 'example.com_error' then stop
# Regular expressions may also be used to match the programname
# if re_match($programname, 'example\.com.*') then stop

Do not forget to restart the rsyslog daemon after configuration changes.

$ sudo service rsyslog restart

Now we login to our ZettaLogs account and verify that web server logs are received and correctly parsed.

Note that Nginx log are also parse with “_apache” format since they both have the same format. Check the details of a an Nginx access log and verify the extracted fields.

We see that all the fields like request URI, client IP, agent, verb, inside an web access log are extracted and ready to be analyzed.

MySQL Configuration

MySQL logging is not enabled by default. First, we will configure MySQL for logging. MySQL has three separate text log files:

  • The Error Log contains information indicating when mysqld was started and stopped and also any critical errors that occur while the server is running. If mysqld notices a table that needs to be automatically checked or repaired, it writes a message to the error log.
  • The General Query Log is a general record of what mysqld is doing. The server writes information to this log when clients connect or disconnect, and it logs each SQL statement received from clients. The general query log can be very useful when you suspect an error in a client and want to know exactly what the client sent to mysqld.
  • The Slow Query Log consists of SQL statements that took more than long_query_time seconds to execute and required at least min_examined_row_limit rows to be examined.

In addition to these text log files, MySQL also has a binary log. We will not be concerned with the binary log in this post. We will not be using the general query log, too. Because, every single query is logged into this file which is too much information. As MySQL folks warn in the MySQL configuration file, turning on the general query log is a real performance killer. Thus, it is not customary to leave it on in a production system. It is only useful to turn it on for debugging purposes to track down the source of a bug.

We will be using the slow query log. The slow query log contains only the queries that took more than long_query_time seconds which can be specified to a resolution of microseconds. Edit the MySQL configuration file /etc/mysql/my.cnf and uncomment (or add) the relevant lines under [mysqld] section:

[mysqld]
slow_query_log = 1
slow_query_log_file = /var/log/mysql/mysql-slow.log
long_query_time = 0.1
log_queries_not_using_indexes = 1

Set the long_query_time to a sensible value for your system. We set it to 0.1 sec here. We also turned on log_queries_not_using_indexes variable which determines whether queries that do not use indexes are logged to the slow query log. For a query to be logged it must have taken at least long_query_time seconds, or log_queries_not_using_indexes must be enabled and the query used no indexes for row look-ups.

Now that we have changed MySQL config file we must restart the service:

$ service mysql restart

After the restart we should see the slow query log file being populated. The slow query log has an odd structure. The information about a single query is distributed over several lines. The slow query log contains log lines similar to the following:

# User@Host: debian-sys-maint[debian-sys-maint] @ localhost [] Id: 3
# Query_time: 0.000173 Lock_time: 0.000095 Rows_sent: 1 Rows_examined: 11
SET timestamp=1482316769;
SELECT count(*) FROM mysql.user WHERE user='root' and password='';
# User@Host: debian-sys-maint[debian-sys-maint] @ localhost [] Id: 4
# Query_time: 0.008729 Lock_time: 0.000111 Rows_sent: 49 Rows_examined: 189
SET timestamp=1482316769;
select concat('select count(*) into @discard from `',
 TABLE_SCHEMA, '`.`', TABLE_NAME, '`') 
 from information_schema.TABLES where ENGINE='MyISAM';

The line starting with # User@Host gives information about the user and the client making the request. The line starting with # Query_time: gives detailed information about the time the query took and the number of rows returned. The next line that starts with SET contains the timestamp information. The fourth line has the query itself. All of these four lines are related to a single query and should be merged together before sending to ZettaLogs log management system. We will accomplish this using Logstash.

Configure Rsyslog for UDP

We have an already working Rsyslog configuration that follows various log files and sends them to our account at ZettaLogs. Thus we will use it as the output for Logstash. For this we will need to start Rsyslog UDP server. Comment out the following lines in /etc/rsyslog.conf:

# provides UDP syslog reception
$ModLoad imudp
$UDPServerRun 514

We will configure Logstash to send its output to localhost:514 as a syslog message. Note that this Rsyslog UDP server is only for our purposes and must NOT be accessed from outside world. Of course, we are assuming that there are iptables rules that deny any incoming traffic from outside world to the server except for a small number of ports explicitly opened like SSH (22), HTTP (80) and HTTPS (443). If you are not using such measures we strongly encourage you to setup UFW for easy management and configure it NOW.

Install Logstash

Logstash is an open source data collection engine with real-time pipelining capabilities. It can be downloaded as a tar archive and run from the command line. However, we will install it to our system since we want it to start with the operating system and run as a service at the background. We will install version 2.4.1. The official installation instructions can be found here. It requires Java 7 or later.

To check your Java version, run the following command:

$ java -version
java version "1.8.0_111"

which should produce output similar to the following:

java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

Download Logstash:

$ wget https://download.elastic.co/logstash/logstash/packages/debian/logstash-2.4.1_all.deb

Install it:

$ sudo dpkg -i logstash-2.4.1_all.deb

Install Syslog plugin:

$ sudo /opt/logstash/bin/logstash-plugin install logstash-output-syslog

Now, Logstash is installed but not yet configured. We will configure it after we configure MySQL logs.

Configure Logstash for MySQL

Now we will configure Logstash for processing MySQL slow log to combine above mentioned four lines into a single line and direct the output to the syslog daemon.

Before configuring Logstash make sure that MySQL log files in /var/log/mysql directory are readable. MySQL generates logs files with mysql user and Logstash runs with logstash user. If MySQL logs are not readable by other users Logstash will not be able to process them. So run the following to make them readable if they are not:

$ sudo chmod +r /var/log/mysql/*.log

Now create Logstash configuration file /etc/logstash/10-mysql-slow.conf with the following content:

input {
file {
  path => "/var/log/mysql/mysql-slow.log"
  start_position => "beginning"
  type => "mysql-slow"
}
}

filter {
if [type] != "mysql-slow" {
  drop {}
}

# ignore '# Time:' as we use the SET timestamp to get the time. 
if [message] =~ /^# Time:/ {
  drop {}
}

multiline {
  # anything not starting with ^# User@Host: roll it up with
  # the previous entries
  pattern => "^# User@Host:"
  negate => true
  what => "previous"
}

mutate {
  gsub => [ "message", "\n", " " ]
}
}

output {
syslog {
  host => "localhost"
  port => 514
  facility => "local0"
  severity => "informational"
  appname => "mysql-slow"
  codec => line { format => "%{message}" }
}
}

Now start Logstash as a service by running:

$ sudo servie logstash start

To enable Logstash every time the system starts run the following command:

$ sudo initctl start logstash

The auto-generated configuration file for upstart systems is /etc/init/logstash.conf.

Debugging Logstash Service

To debug Logstash service take a look into its log files in /var/log/logstash.
The debug level of Logstash may be increased by passing the “-v” option. Options can be passed to the service via /etc/default/logstash configuration file. Add the following to the file:

# Arguments to pass to logstash agent
LS_OPTS="-v"

If you want everything to be logged then you may add:

# Arguments to pass to logstash agent
LS_OPTS="-vv"

Check ZettaLogs for Incoming MySQL Logs

We will log into our ZettaLogs account and verify that MySQL logs are coming and parsed as expected.

Check the details of a MySQL slow log and verify the extracted fields.

We see that all the fields like query, query time, user, inside an MySQL slow log are extracted and ready to be analyzed.

Configure PHP5-FPM

Php5-FPM logging is normally disabled. Edit your pool configuration file eg. /etc/php5/fpm/pool.d/www.conf as follows:

access.log = /var/log/php5-fpm-$pool.access.log
access.format = "%R - %u %t \"%m %r%Q%q\" %s %f %{mili}d %{kilo}M %C%%"
slowlog = /var/log/php5-fpm-$pool.slow.log
request_slowlog_timeout = 3

We turned on both access log and slow log. We adjusted the access log format to give a little bit of more information compared to the default: “%R – %u %t \”%m %r\” %s”. Now, modify Rsyslog configuration file /etc/rsyslog.d/25-zetta.conf to add the following:

...

$InputFileName /var/log/php5-fpm-www.access.log
$InputFileTag php5fpm-access
$InputFileStateFile stat-php5fpm-access #this must be unique for each file being polled
$InputFileSeverity info
$InputFilePersistStateInterval 20000
$InputRunFileMonitor

...

# Send logs
*.* @@data.zettalogs.com:6514;ZettaFormat
# Discard logs read from files after sending them to ZettaLogs
if $programname == 'php5fpm' then stop

Now, we login to our ZettaLogs account and verify that PHP5-FPM logs are coming and parsed as expected.

Check the details of a an PHP5-FPM access log and verify the extracted fields.

We see that all the fields like request URI, memory usage, CPU usage, response time, inside an PHP5-FPM access log are extracted and ready to be analyzed.

That is it! Now we centralized all the logs that we are interested in, all parsed and ready to be analyzed, into ZettaLogs. Now, we can monitor all the parameters like the requested URLs, response times of web server / PHP / MySQL, slow queries, request status, PHP memory and CPU usages etc. We can graph all these parameters and create dashboards. We can set alarms on any query we’d like including any parameters we’d like.