Behavioral Data and the Modern Data Stack: Privacy, Building Your Stack, and What’s Next

Written by Indicative Team


Behavioral data enables companies to build better products and experiences for their customers.

But collecting and harnessing that data means having the right stack in place to turn that data into decisions—to collect, model, and analyze it. That starts with the cloud data warehouse (CDW), which serves as the center of the Modern Data Ecosystem and opens up a broad range of use cases, including Product Analytics.

Indicative CEO Jeremy Levy and Snowplow cofounder and CEO Alex Dean took on this topic in a recent webinar. They discussed the current state of the data ecosystem and what it means for data teams today and in the future.

You can watch the webinar below, or read on for details on:

  • Behavioral data and the growing push for more privacy and ownership for data subjects
  • How to build a Modern Data Stack for your business
  • Where the industry is headed

Note: Before we get too far, let’s get on the same page about a few things:

When we talk about behavioral data, we’re talking about the signals your customers give off when interacting with your digital properties (your website, mobile app, back-end and server-side systems, etc.).

Snowplow is a platform that helps companies generate, enrich, improve the quality of, deliver, and analyze that data.

Behavioral Data and Privacy

Between GDPR, CCPA, and the ongoing battle between Apple and Facebook, it’s clear that privacy issues are coming to the fore in the data world.

After years of growth in the data industry, the pendulum is beginning to swing back toward privacy—with organizations, governments, and everyday consumers growing more concerned about the rights and needs of data subjects.

Most of that concern is directed toward third-party data collection, where third-parties collect big, broad roll-ups of customer data—think Facebook’s third-party cookies—and it’s been a long time coming.

That’s why Alex and cofounder Yali built Snowplow around first-party data collection using an open-source platform. Companies use Snowplow to understand their own customers’ behavior across their own digital properties—no third parties control or access the data.

They built the company on that model because they believe data collection should work in a reciprocal way. Companies use their behavioral data to build better products and experiences for customers. It’s a positive feedback loop where data collection actively improves the experience of its subjects.

In Alex’s view, legislation on data privacy is needed—it’s time for organizations to take the rights of consumers more seriously, he argues. But, as Jeremy points out in the webinar, the execution of that legislation isn’t quite there yet.

That’s something both Alex and Jeremy expect to evolve in the coming years, ultimately striking the right balance between privacy and legitimate business use of behavioral data.

The Modern Data Stack

In part because of those concerns around privacy and third-party data collection, there’s a shift moving the data industry toward a new kind of infrastructure—one where companies own and control their own data in the cloud.

The biggest change enabling that shift comes down to cost. They’ve dropped precipitously to the point where most companies can now afford to implement a data warehouse, to build and scale an ecosystem around it.

That’s created what Alex calls a “Cambrian explosion” of different tools, vendors, processes, and philosophies centered around collecting high-quality data and bringing in additional tools to analyze and operationalize that data as needed.

Companies today can solve really interesting problems that simply weren’t solvable before.

How to Build a Modern Data Stack in 3 Steps

So how can you build a data stack that helps you solve new problems, create more value for customers, and turn that value into revenue?

Step 1: The data warehouse

Both Alex and Jeremy agree: Selecting a data warehouse (or lake) is an important first step. 

The cloud data warehouse serves as a center of gravity for your data, consolidating everything into one central source of truth, and serving as a connection point for other data and analytics tools.

Step 2: What data do you need?

With data storage sorted, the next step is to think about the different pools of data that are important for your business. What kind of data do you track or need to collect? Is it structured data stored in Excel files and databases or unstructured data like email and social media or sensor data?

Step 3: What are your use cases?

The last step is to think about how you’ll use all that data. What are the use cases you want to start building for? Are they analytical, for example, or data products? Are you looking to build something to speed up internal efficiency? Analyze customer journeys and improve conversion?

Now, you can build out your stack by following a checklist—choosing a best-in-class tool from each bucket within the data ecosystem:

  • Ingestion
  • Storage
  • Orchestration
  • Transformation
  • Data quality monitoring
  • Data catalog and governance
  • Operational and applied analytics

But, Alex argues, it’s often better and more meaningful for your organization to begin with the data warehouse—the storage bucket—and build out spokes based on your organizational needs and use cases.

That way, you can select the tools and processes that are best-in-class for your organization and build a data stack that works for your use case(s).

What’s Next for the Modern Data Stack?

In the immediate future, the modern data stack is still evolving, and the impact of data on today’s companies will continue to evolve with it. Ultimately, both Jeremy and Alex agree, the industry will continue to break down barriers, democratize who has access to data, and lower the technical bar to use it.

That will likely impact how teams and organizations are structured around data. 

Traditionally, companies needed data analytics and scientists to manage their data—because it was really hard to do and deep insight and analysis offered an additional layer of value companies couldn’t access without data specialists.

Now, Alex and the team at Snowplow see a lot of diversity in how teams are structured, but a few trends are starting to shake out among their customers:

  • Mid-market and smaller companies are building central data teams that own the whole stack and work as a service desk for other teams.
  • Larger organizations are developing data platform teams, whose focus is on making sure the data is high-quality. Then, individual business teams and users have their own embedded data capabilities, built for their specific use cases.

In both cases, the ownership of data is moving away from technical teams and toward product, marketing, and other business teams.

As for the future of the data industry itself, Jeremy and Alex both note things tend to “move slowly until they move really fast.”

They expect the next few years to continue that trend, with continued development of cloud data warehouses moving further away from the traditional, legacy data storage solutions of the past, and some jockeying between data warehouses versus data lakes.

In the longer-term, the industry will look to find a balance between the current decentralized ecosystem and consolidation.

The ubiquity of the cloud data warehouse and SQL as the universal language of data have made it easier and more cost effective for tools to integrate and work together without consolidating the industry.

While Jeremy believes some consolidation is inevitable, the current best-in-class model creates the space for new companies that solve really important problems and is great for businesses who don’t get locked into a proprietary ecosystem which can be expensive and slow to evolve.

If nothing else, one thing is for certain: the need for behavioral data and talented people who can leverage it is on the rise and will continue to grow for years to come.