31 Mar 2017

Brand new Sentinel.la plugins to control your entire stack

Sentinel.la is a fast way to manage OpenStack, helping you to reduce OpenStack’s learning curve.

Yes, we do love and specialize on OpenStack, but what happens if you need to do more? If you need help making easier your journey as a DevOps, Sysadmin, Developer or a person interested in getting different data about your server or application, well, we were thinking about it, how can we help you? And then the Plugins idea was born.

The plugins are components in python and with them, you can drop some lines of code with python logic (at this moment) to extract server metrics by processes/components. It’s also possible to monitor within a server and transform it into just one piece of data, for example, the number of volumes in a Docker deployment, Ceph health check, is MySQL server running?, etc.. this information will be displayed into the Sentinella App.

In this way, our users just only get and save the information related to their particular needs.

Sentinel.la is seeking not only into bringing you OpenStack, but to also be an efficient tool. We’ve seen OpenStack deployments that use different components like PostgreSQL, RabbitMQ, Apache, etc.. all that being required to be monitored, to get a faster and better troubleshooting.

This is the reason why now you can make your own plugins and share them with the community.

How do plugins work in Sentinel.la?

Sentinel.la takes your metrics and saves this information using our API, all through Sentinella Agent.

Our components are an abstract of the Sentinel.la functionality to adapt new features.

The plugin uses a task in Sentinella Agent to push new metrics and sends to Sentinella API, when arriving at Sentinella API will apply validation to know if is a valid plugin, the next diagram show you the internal workflow.

Sentinel.la has a process to plugin evaluation, this process starts when registering a plugin release.

Is necessary ensure that all Sentinel.la plugins follow the rules, to keep control and quality this is the reason why we have a process to approving but is simple.

Register release.
Evaluatión.
Approve.

The point 2 consists in the review of code, check if the rules have been applied.

What steps do I need to follow to add my plugin into Sentinel.la?

1.- Register Plugin at Sentinella.
2.- Get your plugin_key.
3.- Download the plugin template.
4.- Put your code logic into your plugin, following the specs.
5.- Make a release.
6.- Install plugin into your server with the Sentinella Agent

For more information click here.

Also, you can take other plugins done by Sentinella Community.

What are the rules?

Plugins must register into Sentinel.la.
Follow the documentation to make Sentinel.la Plugin.
Enjoy.

How do I install a plugin?

Piece of cake:

$ sentinella install <plugin_name> <plugin_version>

How do I configure a plugin?

When the plugin is already installed, open this file /etc/sentinella/sentinella.conf this file has a configuration section for the plugins, it’s one object called plugins.
In this section, you must add your plugin.

Section example:

"plugins": {
        "sentinella.openstack_logs": [
            "get_openstack_events"
        ],
        "sentinella.metrics": [
            "get_server_usage_stats"
        ],
        "sentinella.test": [
            "get_stats"
        ],
        "sentinella.sentinella-docker": [ <----- Name of package
            "docker_stats" <----- Name method.
        ]
    },

If you have any questions about the package name, class name, etc.. you can go to /usr/share/python/sentinella/lib/python2.7/site-packages/sentinella/ here are all installed plugins.

Doubts?

Please, contact us 🙂

13 Oct 2016

How build a SaaS around OpenStack (I)

Guillermo Alvarado
App Design, Dashboard, Logs, metrics, Monitoring, OpenStack, Sentinel.la, Startup
0 Comments

How does Sentinelle Labs build apps? What pieces interact in our platform in order to successfully capture and process agent’s data to monitor, backtrace and send notifications if something goes wrong in an OpenStack deployment?

We’ve decided that it’s time to share more details around this topic. In this series, we’ll describe our architecture and technologies used to go from source code to a deployed service to show you how your OpenStack deployment is working. You can expect this to be the first of many posts detailing the architecture and the challenges of building and deploying a SaaS to enhance OpenStack skills, reduce OpenStack learning curve and Backtrace it much faster. Sentinel.la is a fastest way to OpenStack.

High level design

The High-level design (HLD) explains the architecture used for developing a software product. The architecture diagram provides an overview of the entire system, identifying the main components that will be developed for the product along with their interfaces.

Decoupled architecture

As you can see, we’ve used a decoupled architecture approach. This is a type of architecture that enables components/ layers to execute independently so they can interact with each other using well-defined interfaces rather than depending tightly on each other.

API

The first step in order to address a decoupled architecture is to build an API. There’s nothing more important than the application program interface to link components with each other. Our API is called “Medusa” and is built with Flask. An API is a great way to expose an application’s functionality to external applications in a safe and secure way. In our case that external app is “Apollo”, our UI, which will be reviewed later.

Sentinel.la uses a Message queuing system (in this case RabbitMQ). The MQ acts as middleware that allows the software to communicate the different layers/pieces that are being used. The systems can remain completely autonomous and unaware of each other. Instead of building one large application, it’s a better practice to decouple different parts of your application and only communicate between them asynchronously with messages.

We use Celery, a task queue with batteries included written in Python to manage our queue, as I mentioned above, our broker is RabbitMQ and also it does manage the workers that consume all the tasks/messages.

Our UI is called “Apollo”. It’s built in AngularJS, and is an ”API-Centric” Web Application. Basically, it executes all its functionalities through API calls. For example, to log in a user, we send its credentials to the API, and then the API will return a result indicating if the user provided the correct user-password combination. Also we are following the JAM stack conventions. JAM stack is an ideal way of building static websites, in later posts we’ll explain it but the basic idea is that JAM, that stands for JavaScript, APIs and Markup, is to avoid managing servers, but to host all your front-end over a CDN and use APIs for any moving parts.

Datastore

Behind scenes, all data is stored in 4 different databases. We use InfluxDB, a scalable datastore for metrics, events, and real-time analytics. Also we use Rethinkdb the open-source database for the realtime web. One of the components that we use also need MongoDB, an open source database that uses a document-oriented data model. Our relational database is PostgreSQL, an open source relational database management system (DBMS).

Agents

Our platform uses all the information generated on the differents OpenStack. To address that we’ve built an agent 100% written in python (available at https://github.com/Sentinel-la/sentinella-agent/). In order to install it, there are .deb for Ubuntu/Debian and .rpm for RedHat/CentOS packages. Also we have a pip package to install it in SuSE https://pypi.org/project/sentinella/

Data processing engines

To evaluate all the thresholds we developed 2 different daemons, one for InfluxDB (called “Chronos”) and another one for RethinkDB (called “Aeolus”). These pieces have all the rules and logic to rise an alert when something wrong is detected.

Alerta.io

Obviously we need a component that manages all the alerts risen by Chronos and Aeolus. We are proudly using Alerta.io to consolidate all the alerts an also perform de-duplication, simple correlation and trigger the notifications.

Notifications delivery

We send 3 different types of notifications for an alert. First, we send an email (we use Mandrill, the transactional email as a service from Mailchimp). We’ve decided not to maintain an SMTP server. Second, we send slack alerts using their webhooks integrations. Third, of course, we notificate users on Sentinel.la Dashboard pushing alerts to Apollo. In order to accomplish that we use Thunderpush, a Tornado and SockJS based push service. It provides a Beaconpush inspired HTTP API and client.

So far, these are the main components that work together in order to deliver Sentinel.la service. In further posts we’ll do a deeper review of all of them. Next post will be about Apollo, our UI, and the JAM stack.

Thanks for reading, your comments are very welcome.

09 Aug 2016

Sentinel.la now available at PyPI

Guillermo Alvarado
agent, devops, Logs, metrics, Monitoring, OpenStack, Roadmap, Sentinel.la
0 Comments

We are glad to announce that the sentinel.la agent is now available at Python Package Index (PyPI) the official third-party software repository for Python. https://pypi.python.org/pypi/sentinella

PyPI primarily hosts Python packages in the form of archives known as Python Eggs. Similarly to JAR archives in Java – Eggs are fundamentally ZIP files, but with the .egg extension, that contains the Python code for the package itself, and a setup.py file that holds the package’s metadata.

You can access to PyPI with several package managers, includings EasyInstall, PyPM and pip that use Pypi as the default source for packages and their dependencies.

So you will be able to install the sentinel.la agent with pip as following:

guillermo@xps13:~/$ pip install sentinella

With this, Sentinel.la is available for:

RedHat/CentOS through rpm package: https://github.com/Sentinel-la/sentinella-agent/releases
Ubuntu/Debian based systems through deb package : https://github.com/Sentinel-la/sentinella-agent/releases
SuSE Enterprise Linux/ OpenSuSE through python package: https://pypi.python.org/pypi/sentinella

Also, remember to vote for our presentation to OpenStack Summit at Barcelona:

Vote here: Double Win! Helping to consolidate OpenStack implementations (and build a Startup in the meantime)

Keep in touch with us while we’re building the next big thing,

Email: hello@sentinel.la

15 Feb 2016

Sentinel.la App’s Server View Panel: Get insight into your OpenStack servers.

Qui-Gon Jinn
Alameda, Alert, App Design, Dashboard, Logs, metrics, Monitoring, OpenStack, Roadmap, Sentinel.la
0 Comments

This’s part of a serie of posts describing pieces of our amazing app to monitor OpenStack.

The following screenshot belongs to the server view panel. This panel starts showing an overview of the usage and availability of server’s resources , vital signs, openstack services running on it, opened and closed alerts and important log events collected over the last 24 hours.

The App will collect information from logs, processes and system upon the agent’s installation. This information will help to auto-detect and check the status of OpenStack services running on the server. Once the info is collected, Sentinel.la classify services among OpenStack projects: Nova, Neutron, Cinder, Heat, Glance, Keystone and Ceilometer.

Server View panel shows the OpenStack version running. It shows system information like processor type, memory, kernel version, storage device and capacity. You can identify the server by name and you will able to see the status (i.e. maintenance). Cloud group and location is display under the name of the server.

Note you can still have access to push notification from all your geographically distributed cloud groups at the high right corner of your console. Also, you have the option to add more servers hitting the “+ New” button next to the name of the server.

You have three buttons to change your server’s status into your overall OpenStack service:

Toggle Maintenance Mode: Hit this button if you need to do important maintenance tasks or changes to your server (i.e. Change openstack version). Or do it before to remove it from the App (You will be able to remove the server 10min after the App stops receiving data from it). Your overall uptime will not be affected in case the server stop sending data or removal.
Toggle Blackout Mode: Hit this button if you need to do minor changes for troubleshooting on the server. The idea is to stop sending unnecessary notifications. The server is under control and in fixing activities. Uptime indicator is still affected under this mode to estimate the impact of the current event being handled.
Classify Server: use this button to be re-group the server into other cloud system

This view has other options to get a better insight of the services, log events and vital signs. Those can be accessed through the menu bellow the server’s description:

Overview: This option get you back to the server´s dashboard
Alerts: This option get you to a panel with alert’s information over the last 24 hours (The panel shows only the last 5 opened alerts). You will be able to see what alerts has been closed and ones are still opened in a chronological order
Vital Signs: Get vital signs’ details of the server over the last 24 hours
OpenStack Services: Get better insight of the OpenStack’s services running on the server and their heath.
OpenStack Logs: It gets you to a panel with all the important events collected over the last 24 hours. Important events are errors, critical and warnings. This information will help you the get a better understanding of any issue and use it for troubleshooting purposes. The panel brings events in a chronological order and online search options to group events by keywords.

At the right side, you see information of the amount of the alerts are still opened, the server’s uptime and the current server’s load average over the last 24 hours.

A chart showing the amount of warnings, errors and critical events over the last 24 hours has been located under the menu options. This brings you a sample of much activity you are having into the server.

Server vita signs are also shown under the log events chart. The average of CPU, Memory and Disk utilization over the last 24 hours. Even the amount of alerts that have been closed over the last 24 hours.

Information regarding last alerts has been located next to the last panel. A column with the last 5 alerts has been posted with some details in regards to the OpenStack processes and the subject of the event that causes it.

Counters showing the current status of CPU, Memory and Disk usage is also displayed. Next to this counters, you find the “OpenStack services status” bringing a fast snap of the amount of inactive processes out of the every OpenStack’sservice in the server.

21 Jan 2016

OpenStack international growth

Guillermo Alvarado
Alert, App Design, Logs, metrics, Monitoring, OpenStack, Startup
0 Comments

The OpenStack Foundation has been following the correct way during all these years, but what follows is an international leap. Some international companies adopting OpenStack are helping with that. For example, eBay website runs on an OpenStack private cloud platform. Four years ago, eBay’s website was operated fully on its on-premises datacenter infrastructure. “Today, 95% of eBay marketplace traffic is powered by our OpenStack cloud” said Suneet Nandwani, Sr. Director, Cloud Engineering at eBay Inc.

But this international leap won’t happen without some challenges. One of OpenStack main problems is its steep learning curve, you must first achieve a successfully installation and once done, when you think you have done it, things could get ugly. There’s no definite strategy to operate an OpenStack deployment, but with the right tools and the right information on a timely manner, this could be done without pain.

That’s our commitment at Sentinel.la: “Reduce the operational pain of Cloud Administrator/SysOp Teams providing quick, concise and relevant information to solve the problems related to a real OpenStack deployment”.

The other point of inflection to achieve a succesfully international leap are containers and projects such as Murano and Magnum “People are really excited to see how frameworks like Docker and Kubernetes enable companies to bring containers in and make use of them with the networking and security frameworks that they already have,” said Jonathan Bryce, OpenStack Foundation Executive Director.

Take a look to the interview from theCUBE to Jonathan Bryce and Lauren Sell at the Openstack Day Seattle 2015 – theCUBE

Read the complete history: “OpenStack Foundation ready for international growth | #OpenStackSeattle” from Silicon Angle

19 Jan 2016

OpenStack services on a Time-Series database

“Measure what is measurable, and make measurable what is not so.”
—
Galileo Galilei

At Sentinel.la, one of the services we provide is the centralization of data & statistics with a OpenStack centered approach, from OpenStack services (nova-* , neutron-* , keystone and so on…) to even get performance and status of vital server resources. All this information is acquired using an nondependent role server architecture (All-in-Ones, dedicated Controller/Compute/Storage deployments, Converged deployments, we must support and fetch data from all those types of deployments.)

Managing all this information requires a very flexible way of organization and handling. Our first Proof of Concept attempt was to create an agent that gathers all the server information at Operating System level, so the basic information was being captured: CPU, Disk Usage, Memory Usage and Load Average. All this information was being stored on a Relational Database.

The problem with Relational Databases is that they are not optimal for handling large amounts of data. Instead of unleashing the power of having such great information you feel like playing Jenga with it, like that with every new row that is added you can’t help but feeling like losing a little bit of performance and scalability. Imagine having millions of rows with CPU data from thousands of servers… that won’t end well.

“Oh yeah, INSERT INTO measurements…”

What about using a NoSQL database? Well, standard NoSQL databases help a lot managing large chunks of document data, but time series is different: imagine that instead of growing vertical rows, your data grows sideways and it depends heavily on the time when the data was saved. So, if not a standard NoSQL, what should we use to save our metrics? And what about if instead just 5 metrics we want to capture “n” metrics for “n” services on “n” devices?

This is where a Time-Series database is useful. On this type of database you have a timestamp that is the equivalent of the Id, so your values are always associated with it. Those values are organized in series, which are a collection of a measurement (CPU usage, disk usage, etc.) and the tags that you employ to identify that measurement (server name, cloud id, server location, etc.)

Having the data stored on a Time-Series database enables you to think of the information as points, which are easy to identify, search, display and graph. You have many functions to manipulate the data and get the right information. In our case we realized that we could use some aggregations and transformations functions to get things like behavior over time with great precision and accuracy.

For this purposes we chose InfluxDB as our time-series database because Monasca uses it and while we were playing with Monasca we found out that it was perfect for what we do. Also InfluxDB can be used “as a service”, the same guys from InfluxData that created the product offer it as a service. This way we can use (and love) InfluxDB features with High Availability without having to operate it and thus we can focus in our core business.

We feel very fortunate to coincide our development with InfluxDB lifecycle. We started using it at the very moment when the 0.9 version was released. This version was a turning point because it added support for tags. Also it’s a little bit different in terms of syntax and other functionalities like a new thresholding and alerting component (Kapacitor) which was introduced the very same week we were researching and developing our metrics alerting engine!

A whole new world

After solving the database backend and having no limits with performance and reliability now comes the sweet part: we can store all the measurements that we want. We began getting I/O values from servers, and started having OpenStack service related information at first. How much CPU does nova-api use? Is nova-scheduler having peaks of memory? What’s the uptime of nova-compute process? The limit is only our (OpenStack) imagination.

References:

Influx Data https://influxdata.com/

Archives: metrics