Part 3 – Advanced Troubleshooting and Automation

Pramod Sridharamurthy
Monday, August 5, 2013

In the previous blog, we looked at possible systems/processes that can help L1/L2 support teams be more optimized. In this blog, we can continue that thought process and look at how L3/L4 troubleshooting processes can be optimized.

For L3/L4 type of issues, what if there was

  • A log vault that allows the support engineer to get access to the current and the historical logs
  • An interface to create and view configuration of a system as defined by the support engineer. Configuration need not be a single view, but can be defined in multiple ways
  • An interface to define and view attributes whose changes need to be tracked
  • An interface to view related cases, bugs, KBs, E-mails – Requires integration with Bug, KB, Case and E-mail databases.
  • An interface to plot relevant statistical attributes and if required, overlay critical events, configuration changes over the time series graph to cross correlate.
  • An interface to search the logs and also look at / plot trends across install base
  • A rules engine that allows creation of complex rules/KPIs
  • An interface that allows easy analyses across stack, where the components of the stack can be dynamically defined.
  • An interface that provides a canvas that can contain all the required data to solve a given problem (one recipe for each type of problem). – This would require an interface that allows the support engineer to add content from various sources into the canvas, filter each source based on requirement, and represent each source in ways that help faster problem resolution. This canvas could contain data from configuration data, changes in configuration, events/syslog entries, statistical trend of various parameters, Known issue report, list of systems that face similar issues, related bugs, related cases, related knowledge base, related E-mail discussions etc.
  • An option to save the entire canvas as a recipe to be used to solve problems of specific type They could also assign a KPI to know if a problem has been triggered or not. A set of such saved canvas/dashboards can be used by Support engineers to access all details related to solve a particular type of problem and hence solve problems faster without having to reinvent the wheel every time.

Again, the above list is neither comprehensive or detailed, but provides an overview of interfaces, systems, processes that can be put in place to make things easier for the L3/L4 support, but not only providing a common set of tools, but also allowing the approach to be customized to one’s own need

In the final part of this series, I will talk about how this approach can be taken beyond internal support groups.