In my previous blog, I discussed the framework we have built for testing large scale machine log data. In this post, I will share results of our test. Every test run with varying server clusters was executed through our automation framework, and therefore had minimal effort on our side except clicking a button after deciding the number of servers we need.
The scalability of our platform is broadly dependent on 2 things:
1. Scalability and Throughput of Pure Parsing
We did two types of tests –
The results of the two tests are captured in the graphs above. That the platform scales linearly across time and node came as no surprise. Glassbeam's parsing platform is a compute intensive process and requires a minimum of two nodes (note that it was writing to no egress data stores as part of this test). The nodes being used in this test were AWS m5.xlarge machines (4CPU/16GB RAM) in a Docker swarm cluster.
2. Scalability and Throughput of Parsing with Data Stores
When Glassbeam Analytics platform writes to data stores, the following aspects come into play:
Note: the actual tests are a little more complex since Glassbeam's parsing platform uses more than a single type of service to write data.
The results were not surprising for us as we know Glassbeam Analytics platform scales linearly. The exciting part of this exercise was the ease with which we were able to run these tests and at costs that was a fraction of doing this manually.
Our next step is to automate large scale read performance tests, so that we can also do those tests at a click of a button!
Interested in other blogs from our Engineering group? Check out Mohammed Guller’s blog on machine learning. See how we’re raising the bar of combing spectacular throughput computing that I talked about here with AI on Medical Imaging machine data.