Sentinella

09 Aug 2016

Sentinel.la now available at PyPI

Guillermo Alvarado
agent, devops, Logs, metrics, Monitoring, OpenStack, Roadmap, Sentinel.la
0 Comments

We are glad to announce that the sentinel.la agent is now available at Python Package Index (PyPI) the official third-party software repository for Python. https://pypi.python.org/pypi/sentinella

PyPI primarily hosts Python packages in the form of archives known as Python Eggs. Similarly to JAR archives in Java – Eggs are fundamentally ZIP files, but with the .egg extension, that contains the Python code for the package itself, and a setup.py file that holds the package’s metadata.

You can access to PyPI with several package managers, includings EasyInstall, PyPM and pip that use Pypi as the default source for packages and their dependencies.

So you will be able to install the sentinel.la agent with pip as following:

guillermo@xps13:~/$ pip install sentinella

With this, Sentinel.la is available for:

RedHat/CentOS through rpm package: https://github.com/Sentinel-la/sentinella-agent/releases
Ubuntu/Debian based systems through deb package : https://github.com/Sentinel-la/sentinella-agent/releases
SuSE Enterprise Linux/ OpenSuSE through python package: https://pypi.python.org/pypi/sentinella

Also, remember to vote for our presentation to OpenStack Summit at Barcelona:

Vote here: Double Win! Helping to consolidate OpenStack implementations (and build a Startup in the meantime)

Keep in touch with us while we’re building the next big thing,

Email: hello@sentinel.la

17 Feb 2016

Openstack Survey: The price of not knowing is unpredictable

The price of not knowing is unpredictable. The difference between data and information is that information is useful data. Knowing the air temperature in New York is a fact, knowing what your customers expect from you is information. Your participation as community in the survey is not only a way for the OpenStack Foundation to gain information about the community and the OpenStack environment , but also a way for us, members of the community to send information about our organizations, services and priorities, so we (as community) are able to define our path, roadmap and strategy.

In short, the OpenStack Foundation is open and listening to know what the community want. We all will gain with the results while the OpenStack foundation will get helpful information too, to do their job and to figure out how to satisfy the priorities of the community.

Do you want to have the opportunity to influence the Openstack roadmap? It should only take about only 10 minutes to complete the survey at https://www.openstack.org/user-survey/survey-2016-q1/landing

All of the information you provide is confidential to the Foundation (unless you specify otherwise).

15 Feb 2016

Sentinel.la App’s Server View Panel: Get insight into your OpenStack servers.

Qui-Gon Jinn
Alameda, Alert, App Design, Dashboard, Logs, metrics, Monitoring, OpenStack, Roadmap, Sentinel.la
0 Comments

This’s part of a serie of posts describing pieces of our amazing app to monitor OpenStack.

The following screenshot belongs to the server view panel. This panel starts showing an overview of the usage and availability of server’s resources , vital signs, openstack services running on it, opened and closed alerts and important log events collected over the last 24 hours.

The App will collect information from logs, processes and system upon the agent’s installation. This information will help to auto-detect and check the status of OpenStack services running on the server. Once the info is collected, Sentinel.la classify services among OpenStack projects: Nova, Neutron, Cinder, Heat, Glance, Keystone and Ceilometer.

Server View panel shows the OpenStack version running. It shows system information like processor type, memory, kernel version, storage device and capacity. You can identify the server by name and you will able to see the status (i.e. maintenance). Cloud group and location is display under the name of the server.

Note you can still have access to push notification from all your geographically distributed cloud groups at the high right corner of your console. Also, you have the option to add more servers hitting the “+ New” button next to the name of the server.

You have three buttons to change your server’s status into your overall OpenStack service:

Toggle Maintenance Mode: Hit this button if you need to do important maintenance tasks or changes to your server (i.e. Change openstack version). Or do it before to remove it from the App (You will be able to remove the server 10min after the App stops receiving data from it). Your overall uptime will not be affected in case the server stop sending data or removal.
Toggle Blackout Mode: Hit this button if you need to do minor changes for troubleshooting on the server. The idea is to stop sending unnecessary notifications. The server is under control and in fixing activities. Uptime indicator is still affected under this mode to estimate the impact of the current event being handled.
Classify Server: use this button to be re-group the server into other cloud system

This view has other options to get a better insight of the services, log events and vital signs. Those can be accessed through the menu bellow the server’s description:

Overview: This option get you back to the server´s dashboard
Alerts: This option get you to a panel with alert’s information over the last 24 hours (The panel shows only the last 5 opened alerts). You will be able to see what alerts has been closed and ones are still opened in a chronological order
Vital Signs: Get vital signs’ details of the server over the last 24 hours
OpenStack Services: Get better insight of the OpenStack’s services running on the server and their heath.
OpenStack Logs: It gets you to a panel with all the important events collected over the last 24 hours. Important events are errors, critical and warnings. This information will help you the get a better understanding of any issue and use it for troubleshooting purposes. The panel brings events in a chronological order and online search options to group events by keywords.

At the right side, you see information of the amount of the alerts are still opened, the server’s uptime and the current server’s load average over the last 24 hours.

A chart showing the amount of warnings, errors and critical events over the last 24 hours has been located under the menu options. This brings you a sample of much activity you are having into the server.

Server vita signs are also shown under the log events chart. The average of CPU, Memory and Disk utilization over the last 24 hours. Even the amount of alerts that have been closed over the last 24 hours.

Information regarding last alerts has been located next to the last panel. A column with the last 5 alerts has been posted with some details in regards to the OpenStack processes and the subject of the event that causes it.

Counters showing the current status of CPU, Memory and Disk usage is also displayed. Next to this counters, you find the “OpenStack services status” bringing a fast snap of the amount of inactive processes out of the every OpenStack’sservice in the server.

19 Jan 2016

OpenStack services on a Time-Series database

“Measure what is measurable, and make measurable what is not so.”
—
Galileo Galilei

At Sentinel.la, one of the services we provide is the centralization of data & statistics with a OpenStack centered approach, from OpenStack services (nova-* , neutron-* , keystone and so on…) to even get performance and status of vital server resources. All this information is acquired using an nondependent role server architecture (All-in-Ones, dedicated Controller/Compute/Storage deployments, Converged deployments, we must support and fetch data from all those types of deployments.)

Managing all this information requires a very flexible way of organization and handling. Our first Proof of Concept attempt was to create an agent that gathers all the server information at Operating System level, so the basic information was being captured: CPU, Disk Usage, Memory Usage and Load Average. All this information was being stored on a Relational Database.

The problem with Relational Databases is that they are not optimal for handling large amounts of data. Instead of unleashing the power of having such great information you feel like playing Jenga with it, like that with every new row that is added you can’t help but feeling like losing a little bit of performance and scalability. Imagine having millions of rows with CPU data from thousands of servers… that won’t end well.

“Oh yeah, INSERT INTO measurements…”

What about using a NoSQL database? Well, standard NoSQL databases help a lot managing large chunks of document data, but time series is different: imagine that instead of growing vertical rows, your data grows sideways and it depends heavily on the time when the data was saved. So, if not a standard NoSQL, what should we use to save our metrics? And what about if instead just 5 metrics we want to capture “n” metrics for “n” services on “n” devices?

This is where a Time-Series database is useful. On this type of database you have a timestamp that is the equivalent of the Id, so your values are always associated with it. Those values are organized in series, which are a collection of a measurement (CPU usage, disk usage, etc.) and the tags that you employ to identify that measurement (server name, cloud id, server location, etc.)

Having the data stored on a Time-Series database enables you to think of the information as points, which are easy to identify, search, display and graph. You have many functions to manipulate the data and get the right information. In our case we realized that we could use some aggregations and transformations functions to get things like behavior over time with great precision and accuracy.

For this purposes we chose InfluxDB as our time-series database because Monasca uses it and while we were playing with Monasca we found out that it was perfect for what we do. Also InfluxDB can be used “as a service”, the same guys from InfluxData that created the product offer it as a service. This way we can use (and love) InfluxDB features with High Availability without having to operate it and thus we can focus in our core business.

We feel very fortunate to coincide our development with InfluxDB lifecycle. We started using it at the very moment when the 0.9 version was released. This version was a turning point because it added support for tags. Also it’s a little bit different in terms of syntax and other functionalities like a new thresholding and alerting component (Kapacitor) which was introduced the very same week we were researching and developing our metrics alerting engine!

A whole new world

After solving the database backend and having no limits with performance and reliability now comes the sweet part: we can store all the measurements that we want. We began getting I/O values from servers, and started having OpenStack service related information at first. How much CPU does nova-api use? Is nova-scheduler having peaks of memory? What’s the uptime of nova-compute process? The limit is only our (OpenStack) imagination.

References:

Influx Data https://influxdata.com/

14 Jan 2016

Can OpenStack trust OpenStack? (Monasca & Ceilometer)

Qui-Gon Jinn
Alert, App Design, Monitoring, OpenStack, Roadmap, Sentinel.la
0 Comments

“It all begins and ends in your mind. What you give power to has power over you”

– Leon Brown

Many users are hanging on your service at this very moment. Users don’t bear failures as much as a couple of seconds to make them start googling other options. You are willing to invest as much as you need to keep them up to your revenue. On other hand, you have to reduce your operation costs to survive.

OpenStack is an amazing start to get agility and savings. Once you have what you need you’d need to keep that up. Rely their performance level on ceilometer and monasca. Both projects bring important features to get the required insight into your app’s infrastructure: memory/cpu usage at every instance, disk capacity/operations at every volume or network traffic.

Ceilometer was the first taste

Projects like heat use ceilometer to trigger additional instances at your service. It brings sweet ways to auto-scale your service depending on customer’s demands. Stay prepared to the unpredictable (check this yaml file out as a good example). It may wake up your hunger for such use cases. Unimaginable ways out of heat may be used thanks to Ceilometer API. Create scripts to automate your apps own your own. Do it with python.

An agent, a Notifications bus, a Collector and MongoDB form what we call Ceilometer. Agent brings metrics from a bunch of projects like nova, glance, swift and cinder. Some projects bring its own through the notification bus (RabbitMQ). Others have to be directly taken through a polling process.

The Collector finally takes the data from the Agent through the Ceilometer bus. MongoDB is storing all what it gets, waiting to be called from the ceilometer API. This API could be called directly to get a better understanding of your platform at this very moment.

However, Ceilometer doesn’t scale at the way you growth and the information that you can get it´s still limited. Also, queries take much time to get them done, which can make your service less responsive to your expectations.

Monasca arose from higher expectations

Monasca bring a multi-tenant monitoring as a service model based on keystone authentication (self-service). A multi-purpose monitoring project, which can look out not only Openstack resources. Efforts can be appreciated into the alarm/thresholding engine. Many plugins available can be easily deployed. Libvirt is an example of them, which helps to get better insight of what is happening inside the hypervisor. It’s already done to run Nagios plugin. System active checks (HTTP, ping, ssh) and response time measures are part its basic features.

An essential element in Monasca is Kafka. Kafka brings a more scalable and faster message queue than RabbitMQ. Monasca use resources like InfluxDB to efficiently store time-series. It brings data retention policies for later analysis and real-time anomaly detection.

Ceilometer and Monasca have teamed into Ceilosca

Ceilosca is a smart combination of the best properties of both projects. Ceilometers is widely used and has an important progress getting metrics from several openstack project. On the other side, Monasca is bringing a scalable way to collect, process and present metrics.

Cisco and HP have joined forces around a project called Ceilosca (ceilometer + monasca). Fabio Giannetti (Cisco) took this to the light in the last summit. He showed how ceilosca has out-performed ceilometer more than 2 or 3 times. How ceilometer degrades depending as many tenant you have. Much less data is being stored through ceilosca for the same amount of queries.

Ceilosca keeps ceilometer to get data´s metrics to the Monasca API. Replacing the Ceilomoter bus and its collector. MongoDB disappears. Monasca API then take the data to Kafka and so on.

Who looks after those who look after themselves?

Simple and powerful question. Ceilometer and Monasca are amazing tools, which are going beyond to just monitor your App’s assets. However, who looks after them. Who looks out those APIs? Are these openstack’s projects actually being monitored? Who looks after their schedulers, processes, logs, files? Who looks after their availability? Who checks if their trustworthy enough to run critical services on top? Is there any single point of failure? Is there any risk to scale-out further to run out of resources? Are my logs or schedulers about to run out of disk? Are their databases resilient enough?

A monitoring service on top monasca could be a good start. Define the metrics and the thresholds. The question is: Do you really have the experience to do that? Do you have enough insight into every openstack’s project? Do you have the time to do that? Wouldn’t it take you so far from your core responsibility?

Would you really trust your OpenStack configuration? Issues into your App resources are easily detected with tools like ceilometer or monasca. However, Issues into Openstack projects could be out of your league. Or, like I’ve just said, It would take you so far of your core business.

I don’t have any doubt you’d have the skills (If you are reading this post at this point, I’m sure you have them). However, your company needs you to look after their apps and services. Not just build and keep openstack up and running.

And we are committed to support you on this duty. And we are sure you’ll find a lot of sense to rely this responsibility on us. Also, you’ll have fun. OpenStack is not an out of the box solution. You won’t find the same configuration twice. Our service just brings the building blocks. And as openstack does, you can take it all, or just take the parts that make you more sense. On one way or the other, you will save a lot of time, And you’ll be having fun creating your own stuff to get more advantage of our service.

Why Sentinel.la is not using Monasca or Ceilometer?

As we’ve told in our previous posts. We are committed to bring a hyper-scalable service. We’ve put also to the on-demand staff pillar as any other ExO.

We’d have liked Monasca to be our platform. However, our core is not to operate Monasca. “Our mission is to help any mortal to unleash OpenStack at every corner of the universe. Help him/her to do it with confidence”. As we expect from our customers to use just the component they like most from our service. We use the components that can get us closer to this mission. Some components like InfluxDB are part of our solution. However, other components like Kafka aren’t.

Besides Kafka, InfluxDB can be contracted and use on demand. Operate and maintain Kafka would take a lot of energy and focus of our development team. Kafka also is developed in Java. That brings another challenge to manage it and operate it. You can’t be expert in every tech. And we’ve decided to stay in python to get more fluency around the openstack community. RabbitMQ can be paid as you go and you can find many options to hire in the market. Our modular design will let us to change any component in the future, even the MQ or the DB, with no disruption.

That’s being said, Monasca API will be supported into our service in the mid-term. You will be able to take/grab system’s data to/from it. You might choose our dashboard/engine to pull some Monasca’s monitored resources (through Monasca API); or just choose Monasca’s alarm/notification engine instead ours. Part of the benefit of being flexible. Don’t you think?

01 Jan 2016

Alameda: Our journey has begun, spread the word

Qui-Gon Jinn
Alameda, Alert, App Design, Dashboard, Monitoring, OpenStack, Roadmap, Sentinel.la, Startup
0 Comments

“Would you tell me, please, which way I ought to go from here?’
‘That depends a good deal on where you want to get to,’ said the Cat.
‘I don’t much care where -‘ said Alice.
‘Then it doesn’t matter which way you go,’ said the Cat.
― Lewis Carroll, Alice in Wonderland

The first version of our release Alameda is almost cooked. It´s being an interesting journey from just a piece of paper with some written ideas to getting a real plan witnessed by some coffee shops between Coyoacán and Polanco (BTW, those are in Mexico city)

You have the IDEA. What’s next now?

Vision is everything. If you don´t know where to get to, then the path you take is irrelevant and all the energy spent would be wasted. Definitely, you must define a vision and its supporting values. Thanks that, our product roadmap was built in matters of days.

Company´s pillars have been successfully fulfilled so far.

Product’s design decisions have been taken to keep our company´s pillars: hyper-scalability, interoperability and a smart use of staff on-demand. Avoid any performance bottleneck, even against the unpredicted demand of resources. Leverage any need through on-demand resources. In fact, we’ve got our image and logo through designcrowd.com. Check out my previous post for details about our company´s values and foundations.

Our core service (codename: Medusa) has not been developed from scratch. Specific opensource projects and online platforms have been chosen to stick together. We just code the glue between these blocks – python in the best of the cases.

Collect and manage time-series data is the underlying support for any monitoring service. Create a core platform to do that in an effectively and scalable way, it would have taken forever. Influxdata.com seems to have all what we were looking for. It meets features of being an on-demand resource – we don’t want to spend resources operating and getting tuned a database like this. It would draw us away from our core purpose. Someone else can do it for you.– InfluxDB has Kapacitor, a data processing engine: “Kapacitor lets you define custom logic to process alerts with dynamic thresholds, match metrics for patterns…” Kapacitor makes perfect match with our alert system needs.

We bring three types of thresholds: BINARY (“up” or “down”), TAILABLE (logging information) and GAUGE (for measurements). The alert is evaluated and triggered through Kapacitor (InfluxDB). However, we’ve decided also to use Capped collections (MongoDB).

MongoDB have been added to the equation. It’s delivered in a monthly subscription mode for an important bunch of providers. MongoDB scales out amazingly following the not a recent trend of NoSQL databases in response to the demands presented in building new applications. Its capped collection feature makes data “automatically ages out”. Capped will help us to manage tons of information over the time in the most simple and efficient way.

Figure: Dashboard’s mockup

Our dashboard is being built with AngularJS (Javascript), which is maintained by Google. It offers great portability and flow in applications. A “one-page” experience, so it’s never needed to reload the page. We handle the Model–view–controller (MVC) pattern, which facilitates the addition of components in the future and also provides maintainability – developers independently manage core service and dashboard programming – This is the best example of our flexibility – one of our three pillars – Users could even create their own dashboards to interface to our core monitoring system.

Figure: A glimpse of our dashboard

We are working hard to bring lightweight, secure and highly functional agents. They must be installed on-premises at every openstack’s node, which is the only piece of code to install into your infrastructure – They will remotely reach out our online core service, sending data at every configurable interval of time. Authentication must be strong between agents and our core solution. We’ve decided to leverage this important part of our development on JSON Web Tokens (JWT).

JWT has been used to pass the identity of authenticated users between an identity provider and a service provider. Which is Web Dashboard (Apollo) to RESTful API (Medusa) in our case. The browser doesn’t store sessions, making login functionality fully compatible with mobile devices, without any other change or effort needed – we are preparing ourselves to release Sentinella Android/iOS App in the mid-term – that way you don’t need t manage sensible APIs Keys on premises or into devices – mobile devices could be easily stolen -. APIs don’t expire and change them got a lot of management issues and security concerns. JWT expires constantly and changes transparently to the user.

All these components are talking through an external and secure MQ service – BTW, That could be also externalized to a service provider – The idea to do that is to avoid hard dependencies on every component. That helps us, for example, to make modification to the database´s structures or even change DB´s provider or the DB itself with no service interruption.

What have we got in Alameda?

Our MVP (minimal valuable product) is covering monitoring for Nova and Glance so far. Of course, we´ll add more openstack projects like Keystone and Neutron into the next versions. They way how we are managing versioning is exposed into the next picture.

Figure: Alameda’s roadmap overview

The first version will manage all the components we´ve just mentioned. We are excited to take this online as soon as possible. We are sure this will be an important contribution to the community.

Archives: Roadmap