When AIops tools outsmart you

Our ability to augment technology with artificial intelligence and machine learning does not seem to have limits. We now have AI-powered analytics, smart Internet of Things, AI at the edge, and of course AIops tools.

At their essence, AIops tools do smart automations. These include self-healing, proactive maintenance, even working with security and governance systems to coordinate actions, such as identifying a performance issue as a breach.

We need to consider discovery as well, or the capability of gathering data ongoing and leveraging that data to train the knowledge engine. This allows the knowledgebases to become savvier. Greater knowledge about how the systems under management behave or are likely to behave creates a better capability of predicting issues and being proactive around fixes and reporting. 

Some of the other advantages of AIops automation:

  • Removing the humans from cloudops processes, only alerting them when things require manual intervention. This means fewer operational personnel and lower costs.
  • Automatic generation of trouble tickets and direct interaction with support operations, removing all manual and nonautomated processes.
  • Finding the root cause of an issue and fixing it, either through automated or manual mechanisms (self-healing).  

Some of the advantages of AIops discovery:

  • Integrating AIops with other enterprise tools, such as devops, governance, and security operations.
  • Looking for trends that allow the operational team to be proactive, as covered above.
  • Examining huge amount of data from the resources under management, and providing meaningful summaries, which allows for automated action based on summary data.

AIops is powerful technology. What are some of the hindrances to taking full advantage of AIops and the power of the tools? The quick answer is the humans. I’m finding that AIOps tools are not being used or considered, mostly due to shortsighted budget issues. If they are being used, they are not leveraged in optimal ways.    

Although it would be easy to blame the IT organizations themselves, the larger issue is the lack of a critical mass of best practices of the right way to use AIops. Even some of the providers are pushing their own customers in the wrong directions, and I’m spending a lot of time these days attempting to course correct.    

The core issue is the complexity of the AIops tools themselves—ironic considering that they are supposed to combat operational complexities of cloud computing. The difficulty in how to configure the tools properly is systemic.  

What are the best practices that are being ignored or misunderstood? I have a few to share this time, but more in the future:

  • No centralized understanding of the systems under management. The people using AIops tools don’t have a holistic understanding of what all of the systems, applications, and databases mean.
  • Lack of integration with other ops tools, such as security and governance. No coordination across tool silos could actually lead to more vulnerabilities.  
  • Inexperience with how the tools work beyond the basics taught in the initial training. These complex tools require that you understand the workings of AI engines, the correct use of automation, and, most importantly, the correct way to test these tools.

You would hate to have your own AIops solution be smarter than you. The best way to avoid that is to try not to be dumb—just saying.

Posted by Contributor