Glossary of statistical definitions - making sense of popular statistical terms

Tags: Analytics

Glossary of statistical definitions - making sense of popular statistical terms

Published: 9th November 2016

Author: Tatiana Kim
Email: tatiana.kim@iridium-insights.com
Share article:
Share on LinkedInTweet about this on TwitterShare on FacebookShare on Google+Email this to someone

At Iridium Insights we are all about making sense out of data, and as we recognise that a lot of the technical stats terms can be confusing, we’ve created this quick look-up glossary of statistical definitions.

We’ve also added some links to articles that show how these statistical methods are used in the real world.

Quick look-up list

Bayesian modelling
CHAID analysis
Cluster analysis
Correlation analysis
Decision tree analysis
Machine learning
Marketing mix modelling
Multivariate regression
Prediction interval/confidence interval
(Multivariable) Regression analysis
R

Full glossary

Bayesian modelling

There are two different statistical approaches to gaining insights from data: frequentist (or classical) and Bayesian. The frequentist approach builds a model based only on the data observed, while the Bayesian approach allows some subjective beliefs about the model to be incorporated with the observations.
For a more in-depth description, see the second part of our blog explanation How effective is your marketing strategy? – The maths behind the measurement.

 

CHAID analysis (Chi squared automatic interaction detector)

CHAID is a type of a decision tree algorithm that determines relationships between the variable of interest (for example, the number of purchases of a particular product) and the independent variables (for example, customer characteristics – age, gender and socioeconomic status). CHAID automatically creates the decision tree based on the trends and patterns within the data. It can then help understand a customer’s response to a marketing campaign and is often used for customer segmentation.
Back to top

 

Cluster analysis

Cluster analysis is an exploratory data analysis method that helps identify meaningful structures within data. It defines areas/groups/segments of data that share similarities across several measures. In the marketing industry, the cluster analysis is often used to identify customer segments.

CHAID is also often used for customer segmentation, but is a very different algorithm to Cluster analysis. Cluster analysis treats all the variables in the data uniformly, while CHAID analysis recognises the variable of interest and independent variables as separate variables.

To see how we used the model-based clustering method to gain strategic insights from beer consumption data, see our blog on How we used cluster analysis to drive beer company strategy.
Back to top

 

Correlation analysis

Correlation analysis studies relationships between a variable of interest and an explanatory variable. For example, a variable of interest could be premium juice consumption while an explanatory variable could be GDP per capita. If the relationship proves to be statistically significant, the explanatory variable is said to be related or associated to the variable of interest. Parameters such as r-squared and p-value are used to assess the strength of the relationship.

To find out more, read our blog: Correlation doesn’t imply causation but it still matters
Back to top

 

Decision tree analysis

Decision analysis is a general name given to techniques that analyse every possible outcome of a decision. A decision tree is a graph that visualises the outcomes and can be easily interpreted. They can help understand and evaluate risks and uncertainties. They also can help answer questions such as: What are the factors that affect the sales of a product the most? Can we predict a consumer group response to a marketing campaign?
Back to top

 

Machine learning

Machine learning is a method of data analysis that iteratively “learns” from data as it arrives without human intervention. Machine learning can analyse large amounts of data quickly to enable businesses to make decisions about their marketing campaigns in real time and to deliver insights on to complex consumer behaviours.

For more details, see our blog How Machine Learning is advancing customer marketing strategies.
Back to top

 

Marketing mix modelling (MM modelling)

Marketing mix modelling is a method of data analysis used to quantify the impact of marketing activities on product sales. In simplest terms, MM modelling gives weights to different factors that affect product sales. The weights can be determined using for example multivariable regression modelling.

For more details, see the first part of our blog How effective is your marketing strategy? – The maths behind the measurement.
Back to top

 

Multivariate regression

Multivariate regression analysis studies the relationship between several variables of interest against several explanatory variables. For example, the variables of interest could be consumption of beer, cider and wine, while the explanatory variables could be the GDP per capita, commodity prices, new product launches, population demographics and so on. Multivariate regression analysis helps to understand how differently the changes in explanatory variables affect the variables of interest.
Back to top

 

Prediction interval/confidence interval

A confidence interval is a range of values that is likely to contain an unknown value of a variable. Prediction interval is a type of confidence interval that can be used for values that are yet to be observed.

For example, let the local train delay in minutes represent a variable of interest. If we know from experience that the train is never on time, arriving either late or too early by 15 minutes 95% of the time – then we would say that we are 95% confident that the train arrives at the station during the period between 15 minutes before departure time and 15 minutes after the departure time.
Back to top

 

(Multivariable) Regression analysis

Regression analysis is a more general form of Correlation analysis, where the relationships between one variable of interest and several explanatory variables are measured. For example, the variable of interest could be a premium beer consumption while the explanatory variables could be GDP per capita, commodity prices, new product launches and so on. Regression analysis helps to understand how changes in explanatory variables affect the variable of interest. It is widely used for predictions and forecasts.
Back to top

 

R

R is a programming language for statistical computing and data visualisation. It is widely used by data scientists and is available for free under a public licence, although Microsoft has a proprietary version. R can be extended by adding packages that contain particular statistical methods or other additional functionalities.
Back to top

If you are interested in what Iridium Insights could do with your data, please get in contact: info@iridium-insights.com

To find out more about what Iridium Insights can do for your business please get in touch