Glassbeam for root cause analysis

Wednesday, June 11, 2014

As a Solutions Architect at Glassbeam, I come across many pre-sales situations during a demo, when I’m asked to showcase our functionality to perform a root cause analysis. So, here is a good example that I’ve pretty much standardized for such situations. It sure is a compelling value proposition of the Glassbeam Explorer application.

Imagine a hypothetical wireless company called HiWi that has received a complaint from a customer at Acme International Corporation. The customer has attached a “show tech” bundle for the controller that seems to be rejecting any login attempts. It is now up to the TAC engineer at HiWi to perform a root cause analysis.

When the TAC engineer for HiWi is presented with a “show tech” bundle containing hundreds of logs to sift through, searching for patterns becomes a formidable task. Glassbeam offers a tool purpose-built to find that needle in the haystack. It’s called Glassbeam Explorer. It allows the user to run full text or parametric search on data accumulated across the entire install base.

Let’s explore a real life use case to showcase how this can speed up your mean time to resolution (MTTR).

Step 1 – Verification

The first step is to make sure that the correct diagnostic bundle has been shared and that the issue being reported is captured in the data.

Choose customer from the list of facets

Search for messages that contain "failed authentication"

Drill down to specific timeline when the event occurred

As you can see, several failed authentications have occurred in the security log files as well as error logs. This confirms the issue faced by the customer.

Step 2 – Check configuration

The failed authentication message doesn’t necessarily mean that there is an issue with the system. They could very well be invalid attempts. So, the next step is to check the configuration that the system is running. This could point us to the next step in our investigation. As we can see, the controller is a 6933 model and 3.x.x version. This could be an issue because the firmware is known to have several bugs that throw unexpected errors on 6933 models.

Step 3 – Look for unexpected errors

Searching for unexpected errors in the error logs immediately yields a few results. But, the issue need not be related to the authentication issue.

Step 4 – Correlate events

Glassbeam has a special compound search that allows us to look for multiple patterns happening in succession within a time window. Without confirming that the unexpected errors are occurring within the same time window, there is no way to confirm the root cause. Running the compound search immediately yields the result we are looking for. Glassbeam stitches the specific pattern on the error log with the pattern in the security log and shows effect and cause next to each other.

Step 5 – Resolution

According to a recent firmware release, the bug has been resolved. So, the recommended fix is to upgrade the controller to the latest firmware release. The customer is asked to do the upgrade and verify if the issue is resolved.

At the end of the resolution, one might choose to save the search pattern for reuse in the future, as well as share it with other engineers in the department by exposing it as a public filter.

We have conducted time and motion studies with customers who have spent several hours diagnosing such issues manually or using rudimentary text editors. Glassbeam can help reduce your MTTR, increase support efficiency and customer satisfaction by allowing you to intuitively find the root cause of a problem.

If you found the above write-up insightful, I would encourage you to read more on GLASSBEAM EXPLORER since it has lot more functionality to solve many support troubleshooting problems with machine data analytics.