All logs are not created equal Part 2

Author: 
Pramod Sridharamurthy
Friday, October 13, 2017

In the previous blog, I discussed about time series logs and also showed some sample formats and some edge cases. In this blog, the focus is to look at non-time series log.

Majority of the devices like servers, storage, networking and medical devices create logs that capture the current configuration or state of the system through the output of multiple commands run on that device. These log outputs will not have a time associated with it.

All-Logs-Are-Not-Created-Equal-Part-2

A non-time series log is one, where the data logged doesn’t have a date associated with it. A non-time series log come in variety of flavours and here are some examples.

Data logged as regular text

VLAN1 is up line protocol is up
Hardware is CPU Interface, Interface address is 00:0B:86:51:AB:00 (bia 00:0B:86:51:AB:00)
Description: 802.1Q VLAN
Internet address is 10.1.10.5  255.255.255.0&
Routing interface is enable, Forwarding mode is enable
Directed broadcast is disabled, BCMC Optimization disabled ProxyARP disabled 
Encapsulation 802, loopback not set 
MTU 1500 bytes 
Last clearing of "show interface" counters 0 day 3 hr 29 min 23 sec
link status last changed 0 day 3 hr 27 min 24 sec 

Data logged as a Name/Value pair

Data Partition       	: 0:0 (/dev/0)
Software Version	: GBOS 5.5.3.1 (Production Build)
Build number    	: 27833
Label           	: Core OS
Built on        	: 2011-03-01 17:41:20 PST

Data logged as a table with header

Cont                      		      Avail   	Queued/Pkts 
Type 	 Id   	Bits/sec  	Policed        Bytes    	Bytes     	 Flags
---- ---- --------- ---------- ------- ------------ --------- ---- --------- ---------
0    	1     	10000000      	0  	     312500      	0/0     
0   	2     	 2000000        0	      62500       	0/0 

Data logged as a table with without header

TunnInvl(     1ef) 0x0        0x1042      0xa010a05  0x10101fe  0x0        0x0  

Data logged as a table with complex header

------+---------+---------+-----------------+
     | Cpu utilization during past  |
 Cpu |  1 Sec     4 Secs    64 Secs     |
------+---------+---------+-----------------+
 0   |      5% |      5% |       5%         |

Data logged as a table without header and with variable column length

1. any  any  6 0-65535 1723-1723  P4
2. 10.1.112.7 255.255.255.255  any  any  PS4    hits 6
3. any  any  6 0-65535 23-23  4 
4. any  any  17 0-65535 8209-8209  4    hits 1996

Data logged as a table with header as one of the columns

Opdb  SP->DBhi GP->SPmu SP->NPhi SP->CPlo GP->IPhi SP->FPmu FP->KPhi LP->SPmu
-----------------------------------------------------------------------------------
RAW/FREE                                    				 	00022d7a
00000893  
FLOOD                                                00002806          00002806
BRIDGE  	0000019e 00014396                  		 	0000019e 00014396 
ROUTE            000001ee                            00004230 
00000019

A common running configuration format in most networking devices

ip access-list session unix
  any network 216.235.80.0 255.255.240.0 any permit 
!
ip access-list session user
  any any svc-sec-papi permit 
  any any any permit 
!
ip access-list session guest-internet-DMZ
  user   alias guestusers any deny 
  user any svc-http redirect tunnel 100 
  user any svc-https redirect tunnel 100

A hierarchical name/value paid

`display server  inventory details`
Chassis 1:
    Servers:
        Server 1/1:
            Equipped Product Name: GB AD00 M3
            Equipped Serial (SN): ABC1832JP25
            Slot Status: Equipped
            Memory (MB): 393216
            Number of Cores: 20
            Number of Adapters: 1

        Server 1/2:
            Equipped Product Name: GB AD00 M3
            Equipped Serial (SN): ABC23432JP54
            Slot Status: Equipped
            Memory (MB): 393216
            Number of Cores: 20
            Number of Adapters: 1

I guess you get the picture. There are hundreds of such formats even for a single device of a given manufacturer and thousands across different devices. Handling all this variety even for a single device’s log is a complex problem to solve. But this is a key functionality that the platform you choose for machine data analytics has to provide as a significant portion of most machine logs are non-time series data.

While handling complex log formats is one challenge, dealing with how it is packaged is a whole new set of requirements. In my next blog, I will cover common packaging and transport mechanisms of logs in complex machines.