6 Factors You Need to Consider When Selecting a Data Warehouse

At Indicative, we pride ourselves on our data warehouse integration, a feature that is completely unique to our product. For those new to data analytics, this may be the first time you’re exploring data warehouse options. We sat down with our engineering team and discussed what you should be looking for when selecting a data warehouse.

Before we get started, we’ve added a quick refresher on what a data warehouse is.

What is a data warehouse?

A data warehouse is what makes data analytics possible for business users. Data warehouses store data from different sources so that product managers, data scientists, and marketers, like yourself, can then access that information through SaaS tools, like Indicative. Overall, data warehouses help companies make better, more accurate decisions and eliminates human error by compiling data from different sources. Without a data warehouse, it’s impossible to have the actionable insight you need to examine your customer journey.

Who are the major players in the data warehouse space?

There are two main players in the data warehouse space: Google BigQuery and Amazon Redshift. What these services have in common is that they are run on the back of two of the most robust data storage services in the world: Amazon Web Services and Google Cloud Platform. These data warehouses are the most commonly known and used for two main reasons:

  1. They’re easy to integrate with because many people already use their respective cloud storage. For both BigQuery and Redshift, it’s as simple as signing up for an account, verifying your login, and then instantly having a data warehouse. 
  2. No engineering resources are required to use these products. Waiting for your company’s data team can be cumbersome, often requiring weeks of waiting before you receive results. With a data warehouse, anyone looking to do an analysis can access the data needed. 

Given the popularity of these data warehouses, we wanted to take a closer look at what makes them crowd favorites outside of these reasons and how to choose between them.

BigQuery Vs. Redshift

There are six main factors to consider when choosing between these two data warehouses:

Your Current Cloud Platform

It’s a no brainer. If you use Google Cloud Platform, setting up BigQuery is easier and if you use Amazon Web Services, setting up Redshift is easier. By continuing to use their respective products, synchronization and set-up are faster and easier. With that being said, starting from scratch with BigQuery or Redshift is still pretty simple. Both data warehouses require only a few clicks from your logged-in account to sign up and begin sending or analyzing your data.

Price

Both Redshift and BigQuery offer pay as you go or flat rate monthly plans. This flexibility in pricing accommodates business users who know exactly how much storage and computing they will be doing, and those who are uncertain about their monthly usage.

A selling point of BigQuery is that it doesn’t charge customers to load or export data, no matter the size, it only charges for data storage, streaming inserts, and querying data. The big takeaways for BigQuery is that it’s free to analyze 1 TB of data and it’s free to store 10 GB of data/month, but querying costs extra.

Redshift offers several pricing options that allow you to pay as you go, eliminating upfront costs. As a business just getting started, this may alleviate stress in terms of choosing the perfect plan without spending too much or running out of storage space. Starting from the bottom, Redshift can be as low cost as $0.25/hour and scale to $250/TB/year. The big takeaway for Redshift is that it primarily charges for the volume of storage, not the volume of queries run.

If you are only going to be storing data and not querying data, BigQuery is a much cheaper choice because you are really only being charged per query, whereas with Redshift, you are paying a querying fee whether you run a query or not. 

Execution Speed

How quickly a data warehouse can load and export data is an important factor to take into consideration. The faster the execution speed is, the faster data can be queried for critical business insights. 

BigQuery and Redshift take seconds to set up and allow you to begin analyzing your data immediately.

Ease of Use

BigQuery and Redshift both offer the ability to run queries within their platform. The only catch is that both require SQL. For some running SQL based queries may not be daunting, but for the average business user, SQL is not easy to use. Engineering resources may be needed in order to craft in-depth, specific queries, which some companies simply do not have the bandwidth for. However, by using Indicative, no one on your team needs to know SQL in order to build queries! Indicative’s easy to use drop down query builder was designed for non-technical users.  

Focusing only on the data warehouse aspects of both companies, BigQuery is easier to use because it doesn’t require knowledge of underlying hardware, databases, or other configurations, plus it can scale better if your data volume increases. In order to easily use Redshift, it would require engineers and users with a specific skill set, like knowing how to distribute data across nodes, who can manage your companies data scaling and querying.

If you are prepared to dedicate outside resources to setting up and managing your data warehouse, Redshift isn’t a bad selection. If, however, you are looking for a data warehouse that doesn’t require any fine tuning, BigQuery is easier. 

Ease of Access

Both BigQuery and Redshift work extremely well within their own environments. If you already use use Amazon Web Service it’s easier to load your data into Redshift, and if you use Google Analytics it’s easier to load your data from BigQuery. 

As an AWS user, a few clicks in your console allow you to set up your data warehouse. Most of the administration tasks (backups and replication) are automatically taken care of behind the scenes so you can focus solely on your data. 

Security

Security is a top priority for most, if not all, companies that are looking to store and analyze their data. As we’ve seen from major data breaches like Capital One, Equifax, and Target, there are serious consequences and repercussions for companies whose data is released. 

BigQuery’s security is supported by Cloud Identity and Access Manager through Google. Amazon Redshift uses AWS’s Identity and Access Management (IAM). 

Redshift allows you to secure your data a few different ways:

  • Virtual Private Cloud (VPC): To protect access to your cluster by using a virtual networking environment, you can launch your cluster in an Amazon VPC.
  • SSL connections: To encrypt the connection between your SQL client and your cluster, you can use SSL encryption.
  • Cluster encryption: To encrypt data in all your user-created tables, you can enable cluster encryption when you launch the cluster.

BigQuery also automatically encrypts all resting data as a default. By encrypting the data at rest, the chances of an attack are minimized and a BI tool can use the data without providing access to the content. 

Both Redshift and BigQuery can be trusted with securing, storing, and transporting the most sensitive of data. 

Which is the best data warehouse?

Which one is the best? That’s entirely up to you. Every business has unique needs and specifications that may relate better to one than the other. Either way, you can’t go wrong because they both pair well with Indicative! While this list is tailored to BigQuery and Redshift, these selling points can be applied to other data warehouses like Snowflake, Snowplow, and Segment.