The Ecosystem (and Future) of the Modern Data Infrastructure

Written by Jeremy Levy


A simplified look at the current data landscape shows that an architecture where companies own and control their own data—where the data warehouse is the central hub connecting to all other tools and a gravity well for all business data—is emerging.

That architecture represents a major shift in how data is ingested, stored, and analyzed by companies of all sizes—and key players in the data analytics industry aren’t keeping up.

The Traditional Approach to Data Architecture

When I started my first company twenty years ago, in order for us to understand customer behavior, we had to buy servers, rack them, and use software engineers to gain insights. Data and the infrastructure around it were expensive and resource-intensive. Only the biggest companies could afford to invest the required monetary and labor resources into data ingestion and storage. Only large enterprises could afford a data warehouse and a team of analysts to unlock that data for the rest of the company.

That reality meant that many data analytics companies built their products as all-in-one tools, centralizing data collection, storage, and analysis under one roof. Data analytics tools jumped at the chance to lock customers into a proprietary ecosystem where they couldn’t own or control their own data.

The Traditional Approach Doesn’t Work for Modern Businesses

With the dramatic transformation of the data industry over the past few years, major players in the analytics space are failing to keep up.

As the market continues to move in this direction, many of these tools aren’t responding to the changes, especially when it comes to data warehouse integrations. Many data tools simply don’t account for these critical shifts in how customers want to store their data.

The future is about having control of your data and where it is stored, but they’re not building for that future. They have business strategies predicated on locking you into an ecosystem that they own, and that’s neither customer-friendly nor aligned with the market’s future.

Data Warehouses as the Gravity Well for Data

The market has moved to a place where every company, big and small, is able to afford their own data warehouse in the cloud. Cloud data warehouses (CDWs) broke through to the mainstream in 2020, most significantly with Snowflake’s IPO in September. Today, any entrepreneur can sign up for Google BigQuery or Snowflake and have a data solution that can scale with their business in a matter of hours.

With the emergence of SaaS products and the widespread use of cloud storage, the cost for storing and synthesizing data has never been more affordable. The barriers that once separated a company from owning their customer data are insignificant today.

As Andreesen Horowitz notes, “Effective data capabilities are now table stakes for companies across all sectors—and winning at data can deliver durable competitive advantage.”

The emergence of CDWs underpins every use case of user data, from analytics to productization. With each passing year, CDWs become cheaper and easier to operate, accelerating the value a marketer can get from their data. The insights gleaned from the analysis of that data will become a standard necessity for marketers.

So data warehouses are now the centerpiece of the data infrastructure, and they’ll play a pivotal role in the way businesses structure their data stacks and in the kind of data solutions that flourish moving forward—but many of the data platforms operating today aren’t set up for that future. They aren’t set up to put customers first, to allow companies to own their own data, or to contribute to a single source of truth.

Analytics Tools Need to Connect With the Data Warehouse

Kleiner Perkins writes, “Now that all critical business data is centralized and readily accessible via SQL, the logical next step is to use this foundation to build full-featured applications that both read and write to the warehouse. This pattern is already playing out within the most forward-leaning companies, and I suspect it will accelerate into a more common architectural pattern over the next 12 months.”

As businesses increasingly look for more flexibility and control over their data, firms vying for a spot in the modern data landscape will be at a distinct disadvantage if they don’t connect directly with data warehouses.

Product Analytics platforms that connect directly to a CDW and integrate easily with Snowflake, Google BigQuery, and others will provide marketers visibility into the customer journey at every touchpoint. In the coming year, we’ll see significant movement toward those sorts of integrations, and within the next five years, we will have moved past questions of access and onto greater, more specific, and actionable insights.

Movers and Shakers in the Future of the Modern Data Infrastructure

To get a sense of the modern data infrastructure and the solutions leading the charge, we put together the infographic below. We’ve only included tools that are truly the best in their class—solutions we believe will play a big role in the future of data.

Visualization of today's modern data ecosystem

Ingestion and CDP

Everything that flows into and from the data warehouse starts with collecting and ingesting the data that matters for your business. Selecting the right ingestion or CDP solution for your needs is the key to understanding who’s frequenting your website or product and what they’re doing there. A product team, for example, might collect event data on repeat usage or churn—or a marketing team may collect data on various conversion points throughout the website.

Data Modeling and Transformation

Data modeling and transformation tools transform raw information into clean, accessible data. Often using SQL, these tools enable engineers and data analysts to write, test, and deploy code for data transformation and modeling—ensuring the data that makes its way into the data warehouse and downstream from there is trustworthy and error-free.

Operational Analytics and Reverse ETL

As the data landscape evolves, the next big step is to move beyond the dashboard and reporting use case—to broaden data usage from long-term strategy to operational, day-to-day decision-making. That’s where operational analytics comes into play, piping data directly from your data warehouse into other tools that various teams throughout the org use every day (like CRMs, help desk and chat software, email marketing platforms, and more).

Applied Analytics

Where data modeling and transformation tools take raw data and clean it up for data analysts and engineers, Applied Analytics tools are built to help the rest of the organization access meaningful data and use it to inform decisions about the business. Tools in this category fall broadly into 2 main buckets:

  • Business Intelligence (BI) and visualization
  • Product Analytics

Visualization and BI tools are helpful in turning numbers into dashboards and visual representations of the underlying data.

BI tools still require technical resources and expertise in order to pull insights out of them, though. Most BI tools run on SQL, and they require data analysts and engineers to code and run analyses for the rest of the organization. 

That process is slow and creates bottlenecks and strain on technical teams—not a super scalable or modern way to democratize data access.

Product Analytics tools pick up where BI tools leave off, allowing anyone in the organization (including product and marketing teams) to access and query data in real-time, without technical resources. There’s no requirement that users know and deploy SQL or have extensive programming experience. Anyone, from product to marketing to sales, can access and query data.

In that way, these tools are eliminating the last of the barriers that stand between data and the majority of the organization—democratizing data to the point that anyone in the company can access, easily and in real-time, the data they need to make intelligent day-to-day decisions.

That democratization and ease of access to data represents the last frontier for the data landscape, and the tools to make it happen are already here.