Semiotic Parsing Language (SPL) - Breakthrough DSL for IIoT Analytics - Part Two

Ashok Agarwal
Wednesday, June 13, 2018

In this section we will investigate how Glassbeam’s DSL called SPL (Semiotic Parsing Language) helps in parsing multi-structured machine logs.

SPL Terminology:

Namespace:

SPL allows a log file to be treated as a hierarchical document consisting of multiple segments (or sections). Each hierarchical segment is called namespace. This allows for zeroing in on the exact section to parse specific elements from, thus localizing the scope of extracts.

To define the boundary of a namespace, SPL provides the use of BEGIN and END. The developer needs to define the BEGIN of a namespace using a regular expression that uniquely identifies the beginning of a section that is being defined as a namespace.

Notice, the corresponding DEFINE NAMESPACEs in the SPL file, with the BEGINS WITH sections, which key to the beginning of the sections.

The namespaces may also have an ENDS WITH section. However, if it is not present, when another namespace begins, it “closes” the previous namespace. Anything that is part of a namespace but is not parsed is considered UNPARSED, and will be separately available.

Table:

Table is the logical entity where the parsed data is stored temporarily while parsing. This is where Columns and data types (String, int, real etc) are defined. This is also where descriptive labels for UI and indexing rules are defined.

The DEFINE TABLE directive defines the ICON, a parsing methodology and columns where the parsed data is stored. Note that the TABLE is an easy to understand representation of parsed data even though SCALAR does not have tables, in the database sense.

ICON:

ICONs provide a simple way to parse supported log formats without the use of complex regular expressions. There are multiple types of ICONs defined in SPL for specific log formats and the platform allows the creation of more such ICONs. There is one icon for one type of log format.

Supported Icons: NVPair, Align Basic, List Basic, Syslog, CSV, XML, JSON

COL Functions:

Col functions provide various transformation functions on the columns of the table being parsed by SPL. Some examples of column functions are:

  1. Transformation Functions like colsplit, coljoin, colcase, colcopy
  2. Computational functions like colcalc
  3. Specific function to add global variables (Context) - addcontext

Supported column functions:

Transformation:

COLFILL: Fill an empty column with the value from the previous row

COLDROP: To drop the specified column from the table

COLJOIN: Join more number of columns or literals and assign to result column

COLREP: Replaces the regular expression match in column with the specified string

COLSPLIT: Split the column into pieces specified by the back references of a regular expression

COLCOPY: Copy the data from one column to one or more columns

COLCASE: Conditionally assigns new values into result column

Computational:

COLCALC: To perform various transformations on operational data. For example:

  • ADJYEAR
  • CONCAT
  • PLUS
  • MINUS
  • TIMES
  • DIVIDEBY
  • TENTOPOW
  • GMTIME
  • LOCALTIME
  • MD5
  • HEX2DEC
  • INT
  • RANDINT
  • STR2SUM
  • SDF2EPOCH (Pattern)
  • LOWERCASE
  • UPPERCASE
  • ZEROPAD
  • LENGTH

Context:

ADDCONTEXT

  1. The values obtained through back references, can be used in the table
  2. Values assigned as global variables can be added to columns in a table

Example:

Variable Name/Value Pair

==========================

=== SYSTEM INFORMATION ===

==========================

Log Date: Thu Aug 21 23:59:59 2008

OS Release: 1.1.0

Serial Number: 037DF674

Model: G200

Hostname: UOIKWT

Domain Name: medisoft.com

 

SPL code sample

 

DEFINE NAMESPACE ex1 DESCRIPTION 'Module 1'

;

DEFINE NAMESPACE ex1.sysInfo DESCRIPTION 'System Information'

BEGINS WITH /=== SYSTEM INFORMATION ===/

;

DEFINE TABLE Sys_Info NAMESPACE ex1.sysInfo DESCRIPTION 'System Information'

ICON nvpair_basic

COLUMN sys_log_date [s(64):n] <label = 'Date'> AS 'Log Date'

COLUMN sys_os_release [s(64):n] <label = 'OS Release'> AS 'OS Release'

COLUMN sys_serial_no [s(64):n] <label = 'Serial'> AS 'Serial Number'

COLUMN sys_model [s(64):n] <label = 'Model'> AS 'Model'

COLUMN sys_host_name [s(64):n] <label = 'Host'> AS 'Hostname'

COLUMN sys_domain_name [s(64):n] <label = 'Domain'> AS 'Domain Name'

COLUMN sys_company [s(64):n] <label = 'Company'>

COLCOPY (sys_domain_name, sys_company)

COLREP (/\.com/,’’,sys_company)

;

Tabular Data

==========================

=== VOLUME INFORMATION ===

==========================

Volume Size(GB) Used(GB) Avail(GB) Raid Group

------ ------- -------- --------- -----------

/vol0     1000      11   989      rdg001

/vol1     1000      108 892      rdg002

/vol2     1000      95   905      rdg003

/vol3     1000      31   969      rdg004

 

SPL Code Sample

 

DEFINE NAMESPACE ex1.volumeInfo DESCRIPTION 'Volume Information'

BEGINS WITH /=== VOLUME INFORMATION ===/

;

# Above namespace defines the beginning of the section for Volume information

DEFINE TABLE Volume_Info NAMESPACE ex1.volumeInfo DESCRIPTION 'Volume Information'

ICON aligned_basic

COLUMN vol_name [s(32):n] <label = 'Volume'> AS 'Volume' [L]

COLUMN vol_size [s(32):n] <label = 'Size', units = 'GB'> AS 'Size(GB)' [L]

COLUMN vol_used [s(32):n] <label = 'Used', units = 'GB'> AS 'Used(GB)' [L]

COLUMN vol_available [s(32):n] <label = 'Available', units = 'GB'>AS 'Avail(GB)' [L]

COLUMN vol_raid_grp [s(32):n] <label = 'RAID Group'> AS 'Raid Group'[L]

;

Conclusion

In essence, SPL in combination with Glassbeam’s processing platform permits building visually appealing dashboards from data obtained from machine logs:

In fact, once the data is structured, it can also be used for generating machine learning models which are then applied in real time to fresh incoming data.

If you would like to catch up on Part 1 of this series, please click here.