Opening our platform for our customers and letting them build on top of it had been on our minds for some time now. With this capability now built into our platform, it throws open a variety of possibilities for our customers. In a two part series, I will explain the why and how of it.
Domain specific languages help a lot in improving developer productivity. First thing which you need while creating a DSL is a parser which can takes a piece of text and transforms it in structured format (like Abstract Syntax Tree) so that your program can understand and do something useful with it. DSL tends to stay for years so while choosing a tool for creating parser for you DSL you need to make sure that it’s easy to maintain and evolve the language.
Continuing our discussion on Edge Computing and Analytics ….. Remember WE SAID that a key benefit of Edge was Local Decision Making. Typically, that will preclude access to the install base data. However, there is a wealth of information which can be gleaned from the install base data (such as machine learning output). It seems a shame to not be able to utilize that on the edge.
As the Internet of Things inevitable starts coming into it’s own, the origin of data has evolved from people to machines to “things”. Technologies emerged from leaders like Google and Facebook to enable analyzing tons of data in massive data farms deployed in the cloud. All that is well and good, but the approach itself needed moving this “ton” of data to a central location, partition it across large number of nodes so that analysis could be parallelized. Imagine, Netflix has over 1,000 nodes in their cluster. Hmmmm, doable, but at some point the laws of physics start to interfere.
Ever wonder how we power those “which controller went down today” queries that sprawl 1000s of databases, amounting to 100s of terabytes of log data every day? How do we deal with terabytes of data in a robust and efficient manner? We call it harmonic in memory query management.
We’ve been working with a distributed Cassandra cluster for almost a year. During that time, we have learned a bit about achieving scalability, and along the way we have collected some insight on achieving optimal query performance.
Big data Applications are no longer a nice to have, but must have for many organizations.Many enterprises are already using massive data being collected in their organizations to understand and serve their customers better. Ones which have not yet learned how to use it will over a period of time, be left behind. Companies are now looking for platforms which not only provide them analytical capabilities over their data, but also help them become PROACTIVE AND PREDICTIVE. And hence machine learning capability is becoming an important component for any analytical platform.
I like the word “ontology”. It has a nice ring to it. Wikipedia defines Ontology as “knowledge as a set of concepts within a domain, and the relationships among those concepts”. When applied to machine data analytics (“domain”), we see that unless we isolate concepts and understand the relationships, we cannot obtain “knowledge”.
Glassbeam Engineering is working heads down on our next-gen architecture using Cassandra and related column family structures. There are many reasons for this evolution, but one of the key drivers is a compelling support use case. Here is some background for this topic.
We have established beyond a reasonable doubt that knowledge comes from structure. Therefore, parsing IoT logs to create structure is a must do for making sense of this data. Remember the definition of big data – volume, variety and velocity. If you combine that with the business requirement of near real-time analytics, you are looking at a need for high data ingestion speeds. However, the issue is not of ingestion speeds but the total cost of ownership (TCO) for providing that.