All logs are not created equal Part 2

In the previous blog, I discussed about time series logs and also showed some sample formats and some edge cases. In this blog, the focus is to look at non-time series log.
Majority of the devices like servers, storage, networking and medical devices create logs that capture the current configuration or state of the system through the output of multiple commands run on that device. These log outputs will not have a time associated with it.
A non-time series log is one, where the data logged doesn’t have a date associated with it. A non-time series log come in variety of flavours and here are some examples.
Data logged as regular text
VLAN1 is up line protocol is up Hardware is CPU Interface, Interface address is 00:0B:86:51:AB:00 (bia 00:0B:86:51:AB:00) Description: 802.1Q VLAN Internet address is 10.1.10.5 255.255.255.0& Routing interface is enable, Forwarding mode is enable Directed broadcast is disabled, BCMC Optimization disabled ProxyARP disabled Encapsulation 802, loopback not set MTU 1500 bytes Last clearing of "show interface" counters 0 day 3 hr 29 min 23 sec link status last changed 0 day 3 hr 27 min 24 sec
Data logged as a Name/Value pair
Data Partition : 0:0 (/dev/0) Software Version : GBOS 5.5.3.1 (Production Build) Build number : 27833 Label : Core OS Built on : 2011-03-01 17:41:20 PST
Data logged as a table with header
Cont Avail Queued/Pkts Type Id Bits/sec Policed Bytes Bytes Flags ---- ---- --------- ---------- ------- ------------ --------- ---- --------- --------- 0 1 10000000 0 312500 0/0 0 2 2000000 0 62500 0/0
Data logged as a table with without header
TunnInvl( 1ef) 0x0 0x1042 0xa010a05 0x10101fe 0x0 0x0
Data logged as a table with complex header
------+---------+---------+-----------------+ | Cpu utilization during past | Cpu | 1 Sec 4 Secs 64 Secs | ------+---------+---------+-----------------+ 0 | 5% | 5% | 5% |
Data logged as a table without header and with variable column length
1. any any 6 0-65535 1723-1723 P4 2. 10.1.112.7 255.255.255.255 any any PS4 hits 6 3. any any 6 0-65535 23-23 4 4. any any 17 0-65535 8209-8209 4 hits 1996
Data logged as a table with header as one of the columns
Opdb SP->DBhi GP->SPmu SP->NPhi SP->CPlo GP->IPhi SP->FPmu FP->KPhi LP->SPmu ----------------------------------------------------------------------------------- RAW/FREE 00022d7a 00000893 FLOOD 00002806 00002806 BRIDGE 0000019e 00014396 0000019e 00014396 ROUTE 000001ee 00004230 00000019
A common running configuration format in most networking devices
ip access-list session unix any network 216.235.80.0 255.255.240.0 any permit ! ip access-list session user any any svc-sec-papi permit any any any permit ! ip access-list session guest-internet-DMZ user alias guestusers any deny user any svc-http redirect tunnel 100 user any svc-https redirect tunnel 100
A hierarchical name/value paid
`display server inventory details` Chassis 1: Servers: Server 1/1: Equipped Product Name: GB AD00 M3 Equipped Serial (SN): ABC1832JP25 Slot Status: Equipped Memory (MB): 393216 Number of Cores: 20 Number of Adapters: 1 Server 1/2: Equipped Product Name: GB AD00 M3 Equipped Serial (SN): ABC23432JP54 Slot Status: Equipped Memory (MB): 393216 Number of Cores: 20 Number of Adapters: 1
I guess you get the picture. There are hundreds of such formats even for a single device of a given manufacturer and thousands across different devices. Handling all this variety even for a single device’s log is a complex problem to solve. But this is a key functionality that the platform you choose for machine data analytics has to provide as a significant portion of most machine logs are non-time series data.
While handling complex log formats is one challenge, dealing with how it is packaged is a whole new set of requirements. In my next blog, I will cover common packaging and transport mechanisms of logs in complex machines.