Cloud Data Lake

Data Lakes: Navigating Storage and Analytics in Cloud Contact Centers

According to recent Aberdeen research, the average company is seeing the volume of their data grow at a rate that exceeds 50% per year. Additionally, these companies are managing an average of 33 unique data sources used for analysis. If your company operates a contact center, this onslaught of data is coming in via customer/agent interactions on digital channels, call recordings, AI processes and CRM actions.

Since cloud contact centers are hosted services, companies have already bought into the notion outsourcing data management, making it an increasingly important part of the decision-making process when looking for a cloud contact center. The CCaaS platforms that will rise to the top going forward are the ones that are able to derive insights, discover trends and glean actionable business decisions based on their data stores. 

What is a Data Lake?

A data lake is a repository for storing massive amounts of raw data in its native form, in a single location.

It has evolved over the years, from local storage, to data warehouses, and now the data lake concept – with each step, creating better capabilities for companies to store, manage and analyze the growing amount of data generated each day.

A data lake can store all of a company’s data – both in structured and unstructured forms – petabytes at a time. By leveraging the cloud for data stores there is really is no limit as to how much data can be stored. Companies can now apply analytics including AI and machine learning to new types of structured and unstructured data like log files, data streams, social media, and data created by IoT devices that are stored in the data lake.

Companies are realizing the benefits: the Aberdeen survey found that a well-conceived data lake helped companies translate analytics into substantial ROI in the form of business growth and profit boost. Those companies saw markedly higher year-over-year improvements in operating profit and organic revenue growth as well.

The modern cloud data lake has evolved into a scalable, low-cost data management system that allows organizations to easily store all data types from a diverse set of sources, and then analyze that data to make evidence-based business decisions.Cloud Data Lake

The Old Way: Databases and Data Warehouses

Whether in file cabinets or on-premises servers, companies have always valued and stored data in some form. It was common for companies to have a rack of on-premises servers supporting a corporate database where files, CRM data and contact center interactions were only accessible via internal LAN.

But when it came to analyzing and triangulating that data, there were issues with the various types of files being stored. For example call recordings, video, images, text files, spreadsheets, web data, digital interactions from a variety of channels including web chat, email, messaging, social media and others – all different formats were stored in a predefined database structure based on file type that made it difficult for the analytics systems in place to read across the different file types.

The data management industry again quickly evolved and adapted, and then data warehouses were introduced. Data warehouses not only store data but enable better analytics by querying the warehouse for insights and trends in the data. Within each warehouse, information is separated out into data marts, each with its own “schema” (a table-row system for organizing the data).

However, despite the promise of better analytics and querying ability, like most on-premises software systems data warehouses are complex to implement, operate, and maintain. And, being hardware based, scalability is limited to how much physical space and bandwidth a company has. Clusters must be provisioned to handle peak processing loads, even if they aren’t being used most of the time, further raising the cost.

As expected, leading technology providers like Google, Microsoft and Amazon have used their cloud market share to introduce a (now interim) generation of data storage products – Cloud Storage, Blob and S3, respectively. Now, with these options, you can offload your data storage to their cloud infrastructure, but there are still a lot of managerial tasks required from the IT team.

In addition, while it is easier to put data in the cloud, analyzing that data remains a challenge. On-prem or in the cloud, companies needed to hire skilled data engineers to come in and create custom coding to perform queries across the data warehouses. Their specific skillset means that they are hard to find, expensive to onboard and in demand from competitors.

Why use a cloud data lake?

Imagine a vast area where structured and semi-structured data can be kept in its raw form, creating an environment that blends it all together where it can be easily stored, loaded, integrated, and analyzed – creating a single source of truth, even for multi-cloud environments. The cloud data lake is now able to fix the common issues with both on-premises and cloud data warehousing. There is no longer a need to keep disparate data warehouses in sync with each other.

The transition from data warehousing to a modern data lake is made easy since in most cases, data remains housed in Amazon S3, Microsoft Azure Blob, or Google Cloud storage. What changes is the addition of a cloud analytics layer that can produce insights, trends, patterns and ultimately business value from all that data.

The Cloud Advantage

By storing data in the cloud, and using a layer of cloud services to organize, read and analyze that data, companies can remove much of the management required with traditional on-premises storage. The benefits of cloud-based data lakes read much like those of any cloud service: near-infinite elasticity, the ability to easily scale up and down to handle fluctuations in usage, and a model where you only pay for the services you use.

Let’s take a closer look at some of the business cases for considering using a cloud data lake, or working with a service provider that employs data lake technology:

Store all forms of data in a single repository:

All file types and in structured, semi-structured, and unstructured forms

Instant, automatic elasticity:

Data lakes allow the dynamic distribution of processing and computing power to the users or workload that need it most, without affecting the performance of the queries being run at the time. You can also add additional compute clusters in if there are bigger workloads happening – with autoscaling, this can even happen automatically. The system determines new cluster needs to be created and then the load is balanced. When the load amount reduces and the queries catch up, the new cluster spins itself down without manual intervention.

One set of data, many users:

The cloud allows an almost infinite number of users to access a single repository of data via the data lake, meaning that there can now be a single source of truth across the entire organization.

All the perks of as-a-service:

Delivered via a hosted model, a cloud data lake removes the work an engineer would normally have to do to manage a repository, including provisioning, data security, backups and tuning. Human capital can be reallocated to tasks that deliver more value to the business, such as data analysis.


By moving storage costs to a monthly operating expense rather than large capital expenditures for hardware provisioning and maintenance, you can make smarter budgeting decisions and support big bets for business growth.

Data Lakes for Cloud Contact Centers

Up to this point, we’ve focused mainly on what data lakes are and how they came to be. But it is even more important to understand what data lakes can enable your contact center to do. Or, if you are researching cloud service providers, such as cloud contact centers, you need to understand why those employing modern data lake technology are the best choice for managing, protecting and analyzing your data. 

In the case of the cloud contact center, transactional log data is constantly streaming in from the Automatic Contact Distributor (ACD). Contact centers are no longer just call centers – these customer engagement hubs are being contacted from 30+ digital channels including chat, email, and social messaging and media, as well as CRM queries for customer card pops to get customer data to agents as interactions occur.

In the case of NICE’s CXone platform, we analyze that data to calculate SLAs, determine customer sentiment, assign customer support to boost first-call resolution, and manage escalations in order to solve any issues as quickly as possible. On the agent side, data streams from Workforce Engagement Management and Quality Management apps exchange KPI metrics, agent scheduling requests and gamification dashboards into the data lake.

This multitenant cloud approach to data storage and sharing means contact centers can also receive shared sources, such as partner applications, without having to move the data from an outside data source to the data lake. CXone integrates with hundreds of DEVone partners via APIs, and those applications all generate data that CXone leverages to help agents perform better and ensure an exceptional customer experience. Using our data lake, we’re able to easily send and receive shared data in a secure way and form triangulations between partner data with stored information from the cloud contact center platform. Databases and Data Warehouses for cloud contact centers

Choosing a cloud contact center: The data lake discussion

Now that you’re armed with the knowledge of what a cloud data lake is and how it applies to the contact center, you can work that information into your research for a cloud contact center platform. Because cloud data lakes are a modern technology, chances are you will run into companies that are still using on-premises storage, or will claim to be cloud-based, when in fact what they’ve done is copied data warehouses from on-premises servers into the cloud without employing the cloud analytics layer that is so critical to realize the benefits of the data lake. In order to properly operate a data lake, the entire platform must be cloud-native – meaning storage, processing, applications - all must reside in the cloud.

Consider asking the following data lake-related questions when speaking with potential cloud contact center providers:

Where will my data live?

Here you will be able to uncover whether the provider uses on-premises or cloud storage, if they’ve simply copied data warehouses to cloud storage, or if they are using data lake technology with a cloud analytics layer. Chances are they will be using one of the big 3 storage solutions (Amazon, Google or Microsoft).

NICE leverages AWS to support our data lake strategy. Their CloudFormation template offers security, durability, and scalability using Amazon S3 for storage, and supports our strategy of using Microservices (via AWS Lambda) to perform data transport, analytics and management within the data lake.

Will my costs go up as I create more data?

Remember when a 50G external hard drive cost hundreds of dollars? Seems like forever ago. Storage pricing itself has decreased thanks to cloud solutions from the big players, but it’s important to know if your provider is still employing a strategy that can only support finite amounts of data object storage. Ask how they can help control the cost of storing and analyzing petabytes of data at scale, including the cost of your compute usage, like auto-scaling.

Will my data be secure?

Securing vast amounts of data on the move is one of the biggest challenges facing most organizations. Data lake providers like AWS and Snowflake offer features that control access and configuration errors, but the cloud contact center provider should also have a robust, multilayer security approach in place. Ask about encryption as data is moved – does it have AES 256-bit encryption when it is stored, when it is being staged for the data lake, when it’s in the database object in the data lake itself?

How does the company handle authentication? Are connections secured with Transport Layer Security (TLS) 1.2. protocols? The data lake and the platform itself should support authentication such as SAML 2.0 or OpenID Connect, as well as role-based access control (RBAC) so each user is authorized to access only data that he or she is explicitly permitted to see. These are necessary security measurements for the cloud platform but be sure to ask if they extend to database objects in the data lake, such as tables, schemas, and any virtual extensions.

In conclusion, If you’re considering a cloud contact center for your organization, be sure to ask “behind the scenes” questions about the platform’s architecture and how data is stored, moved and secured – in addition to common questions about the call routing, recording and workforce management functionalities. Cloud contact center providers that employ a data lake strategy can make your business more agile by providing business insights and analytics across all types of data from a “single source of truth” repository for your entire organization.

To talk to an expert about how NICE can put our data lake strategy to work for you, contact us here.