Case Study
Powering Search Capabilities
with a Serverless Data Architecture
Executive Summary
Backup365® is the simple, secure Microsoft 365 backup solution designed for Managed Service Providers (MSPs). Backup365 is an iAwards Merit Award winner 2021, iAwards NSW Start-Up finalist 2021 and ARN Start-Up Innovation Awards finalist 2021.
Backup365 automatically discovers and continuously backs up all mailboxes, SharePoint sites and OneDrive accounts without the end customer or the Managed Service Provider (MSP) needing to worry about managing these accounts for backup.
Backup365 performs regular snapshot backups of all cloud business data multiple times a day and saves it to local data centres in Australia.
Backed up data cannot be deleted, is easily searched, accessed and restored by a MSP, IT provider or administrator using the Backup365 portal. The same functions can also be completed by the end user via the Backup365 portal or the Outlook Add-in for both Mac and PC.
The Challenge
Risks of doing nothing
Why AWS?
Why Orinoco Solutions?
Implementation
AWS Services and Architecture
Backup365 runs on AWS infrastructure to perform backup activities on a regular basis to thousands of customers along with the capability of restoring items at any time. Emails are among the items backed up by the system, supported with a search service that allows customers to find, select and restore one or more items based on a complex search criteria.
The search email service had been successfully implemented with Open Search (former Elastic Search) for efficient text search and was working as expected. After a couple of years of use, Backup365 technical and business teams realised that the actual usage of the service was very infrequent though, which was normal as customers would not restore email items unless there was an issue with their active data.
As Backup365’s customer base started to grow significantly, the search email service infrastructure costs went up considerably in terms of storage and computing resources. The latter was not inline with the actual customer usage and the expected ROI.
“We need a solution that allows us to continue to serve our customer needs, that is scalable, secure and cost effective” says Andrew Johnson, Chief Executive Officer of Backup365. “We’ve realised that the usage pattern is very infrequent, near to realtime data is desirable though, AWS Athena might be a solution.”
The cost of running Elasticsearch with 2 billion email items and continuously growing without a proportional income to support it, would have caused massive financial losses, making Backup365’s email backup business unviable in the short and long term.
Backup365 have been using AWS since the beginning of operations and wanted to continue using it to solve their challenges.
Backup365 engaged Orinoco Solutions, an AWS partner with a strong experience solving software problems with AWS services, skilled and certified professionals in Data Analytics and Serverless solutions and a track of successful projects delivered.
Discovery Phase
Orinoco Solutions’ specialists and Backup365 team scheduled a series of meetings to assess the current situation in terms of technology and people. The assessment took into consideration different aspects of the data characteristics, such as volume, ingestion and search frequency classification and costs. Likewise, it was evaluated how to best integrate both teams to deliver the solution in a timely and cost effective manner. As a result a plan was put together with specific objectives and delivery milestones.
Implementation Phase
The target solution was to replace Elasticsearch with Athena for email search given the substantial cost reduction and the infrequent search usage pattern.
Proof of concept: Orinoco Solutions validated the solution in a Backup365 development environment with production-like data to have a more realistic sense of the configuration to be used.
Pilot in production: Once the POC gave us enough confidence, we moved to production, with a parallel approach, where the old and new systems were in place simultaneously allowing us to observe and adjust with minimum disruption.
Final release to production: After several days of observation and adjustments, the old system was decommissioned and the new one was switched on 100%.
The solution entailed a data pipeline to ingest data from an S3 bucket with ~2 billion objects and an Athena database configured to serve searches from an API endpoint called from the Backup365 portal. The data pipeline was designed to be used for both, initial bulk migration and near real time ingestion of new items.
Below a diagram with a detailed explanation of the architecture and AWS services utilised:
-
Every time an object is created in the Original Emails bucket an event initiates the S3 Event Listener AWS Lambda function.
-
S3 Event Listener AWS Lambda function publishes events to an SQS queue.
-
SQS delivers events to Email Parser AWS Lambda function in 2000-item batches.
-
Email Parser AWS Lambda function consumes the SQS queue events, fetches objects from the Original Emails bucket, parses them and puts a record batch to a Kinesis Data Firehose delivery stream.
-
Amazon Kinesis Data Firehose, buffers incoming data, partitions by organisation, year and month and delivers to the Parsed Email bucket.
-
Every time an object is created in the Parsed Emails bucket an event initiates the Partitions Updated AWS Lambda function
-
Partitions Updated AWS Lambda function update/create partitions in an AWS Glue table.
-
Portal users perform email searches.
-
Amazon Athena queries the Parsed Emails bucket.
-
Amazon EventBridge invokes File Compactor AWS Lambda function on a scheduled basis.
-
File Compactor AWS Lambda function triggers an AWS Glue job.
-
AWS Glue job compacts small files into larger ones in the Parsed Emails bucket.
Results and Benefits
As a result of this project Backup365 was able to migrate 2 billion emails to the new system in a short period of time and enabled near real time data ingestion that makes email search available within seconds.
After the successful implementation of the data pipeline and search capability with Athena, Backup365 was able to decommission the ElasticSearch cluster, which saved them more than 90% in infrastructure costs.
Backup365 is now equipped with a solution that provides the same level of services to their customers without incurring unnecessary costs.