11 Jan 2016

Mastering the Openstack logs

how fast do you detect a problem in your deployment? no problem is so serious as it seems when talking about Openstack errors. The secret is in mastering the logs.Openstack and their components that run on top of it can generate all different types of messages, which are recorded in various log files.

Whenever a problem occurs in an Openstack deployment, the first place you should look up is in the logs. Analyzing the information provided in the logs, you may be able to detect what the problem is and where the error occurs. A lot of time the user interface only shows “An error occurred” but all the information regarding that error resides in the log file. You can use these messages for troubleshooting and monitoring system events.

Openstack has several services, and each of them have a log file, so there are a large number of log files. A good DevOps team that are managing an Openstack deployment -no matter the size- should need to locate the logs and learn how to work with them track the status and health of their deployment.

Where are the Openstack logs?

The Openstack services use a common location for their logs, in a default configuration of an openstack deployment, log files are located in subdirectories of /var/log directory:

 

Captura de pantalla de 2016-01-07 10:24:42

 

Table from: http://docs.openstack.org/openstack-ops/content/logging_monitoring.html#openstack-log-locations

OpenStack uses the following logging levels: DEBUG, INFO, AUDIT, WARNING, ERROR, CRITICAL, and TRACE.  What do each level mean?

  • Debug: Shows everything and is likely not suitable for normal production operation due to the sheer size of logs generated
  • Info: Usually indicates successful service start/stop, versions and such non-error related data. This should include largely positive units of work that are accomplished (such as starting a compute, creating a user, deleting a volume, etc.)
  • Audit: REMOVE – (all previous Audit messages should be put as INFO)
  • Warning: Indicates that there might be a systemic issue; potential predictive failure notice
  • Error: An error has occurred and an administrator should research the event
  • Critical: An error has occurred and the system might be unstable; immediately get administrator assistance

from: http://stackoverflow.com/questions/2031163/when-to-use-log-level-warn-vs-error/2031209#2031209

 

So the messages in the log files only appear if they are more “severe” than the log level that is set. For example using DEBUG we are allowing all log statements through. If you set DEBUG flag as FALSE only debug messages will be discarded. If you don’t want to see your logs polluted by INFO messages saying “hey, I’m here asking for somewhat!” then you can set VERBOSE as FALSE making WARNING the default. 

*Those configurations are by service, so you need to change it on every conf file (of each service).

As you may know, there are logs by non-openstack components. OpenStack uses a lot of libraries, which do have their own definitions of logging. This logs can be wildly different because each one has their own definitions(MySQL, SQLAlchemy, KVM, OVS, Ceph,etc).

How do an Openstack log record  looks?

The following is an example of a DEBUG log:

2016-01-04 22:41:36.297 DEBUG oslo_db.sqlalchemy.engines [req-af32b586-0aab-4846-b097-12604699d5ec None None] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py:256  

Managing your logs

It is good practice to centralize events(logs) from our systems on a server, this server will collect the logs of our systems, classify them and then store them. So, there are two popular log collectors, Fluentd written in CRuby, used in Kubernetes and maintained by Treasure Data Inc and Logstash, written in JRuby and maintained by elastic.co. They have similar features. Both collectors have their own transport protocol, failure Detection and Fallback. Logstash uses Lumberjack protocol, and is Active-Standby only, in other hand Fluentd uses forward protocol and can be deployed as an Active-Active service (load balancing) or Active-Standby. You can read more about Logstash and Fuentd on their sites.

Whatever your decision is, you will need parse the Openstack logs to manipulate them. Yes, regex strikes back, welcome to the regex hell.

memeregex

To save you a little time, we want to share the regular expression to parse the Openstack logs that Sentinel.la DevOps team wrote for such purposes:

Source: https://github.com/Sentinel-la/OpenstackRegexLog

 

OpenstackRegexLog

Regular expression to parse openstack logs

Example

1.- The following DEBUG log:

2016-01-04 22:41:36.297 DEBUG oslo_db.sqlalchemy.engines [req-af32b586-0aab-4846-b097-12604699d5ec None None] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py:256

Parsed and stored as json:

{
"time" : "2016-01-04 22:41:36.297",
"description" : "[req-af32b586-0aab-4846-b097-12604699d5ec None None] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py:256",
"level" : "DEBUG",
"log_id" : null,
"component" : "oslo_db.sqlalchemy.engines ",
}

2.- The following WARNING log:

2016-01-04 22:41:35.221 19090 WARNING oslo_config.cfg [-] Option "username" from group "keystone_authtoken" is deprecated. Use option "user-name" from group "keystone_authtoken".

Parsed and stored as json:

{
"time" : "2016-01-04 22:41:35.221",
"description" : "[-] Option "username" from group "keystone_authtoken" is deprecated. Use option "user-name" from group "keystone_authtoken",
"level" : "WARNING",
"log_id" : 19090,
"component" : "oslo_config.cfg",
}

 

As you can see, it is very important understand and manage correctly the log files to run an Openstack environment. How are you managing your Openstack logs?

Share this

All rights reserved© 2017 Sentinelle Labs.  Terms and conditions | Privacy Policy

Click Me