What is Exploratory Data Analysis?

Written by Caitlin Davidson

Share

Exploratory Data Analysis Defined

Exploratory data analysis (EDA) is the critical process of performing initial investigations on data, to discover patterns, spot anomalies, test hypothesis and to check assumptions through the use of  summary statistics and graphical representations.

The objectives of EDA include:

  • Suggesting hypotheses about the causes of observed anomalies
  • Assessing assumptions on which statistical inference will be based
  • Supporting the selection of appropriate statistical tools and techniques

Exploratory data analysis can also help with:

  • Generating questions about a users data.
  • Searching for answers by visualizing, transforming, and modelling a users data.
  • Refining questions and/or generating new questions.
  • Detecting mistakes or anomalies
  • Allowing for preliminary selection of appropriate models
  • Determining relationships among the explanatory variables
  • Assessing the direction and size of relationships between explanatory and outcome variables

Some techniques in EDA that can be used include:

  • Clustering and dimension reduction techniques
  • Univariate visualization of each field in the raw dataset
  • Bivariate visualizations and summary statistics
  • Multivariate visualizations and mapping
  • K-Means Clustering
  • Predictive models, e.g. linear regression.

In Data Defined, we help make the complex world of data more accessible by explaining some of the most complex aspects of the field.

Click Here for more Data Defined.