<- Back to Glossary

Scatter Plot

Definition, types, and examples

Scatter Plot

What is a Scatter Plot?

A scatter plot, also known as a scatter diagram or scattergram, is a type of data visualization that displays values for typically two variables for a set of data. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis. Scatter plots are widely used in various fields, from scientific research to business analytics, to visualize relationships between variables and identify patterns in data.

Definition

A scatter plot is a graph in which the values of two variables are plotted along two axes, the pattern of the resulting points revealing any correlation present. Each point on the plot represents an observation or data point with two values – one for each variable being studied. The position of a point depends on its two-dimensional value, where each value is a position on each axis.

Key characteristics of scatter plots include:

1. Two-dimensional representation: Typically uses an x-axis (horizontal) and y-axis (vertical) to plot data points.


2. Individual data points: Each point represents a single observation or instance in the dataset.


3. Correlation visualization: The overall pattern of points can reveal relationships between variables.


4. Scales:  Both axes usually have numerical scales, though categorical data can sometimes be used.


5. Customization options:  Points can be colored, sized, or shaped differently to represent additional variables or categories.

Types

Scatter plots come in various forms, each suited to different types of data and analytical needs:

1. Basic Scatter Plot: The standard form, showing the relationship between two continuous variables.


2. Bubble Chart:  A variation where the size of each point represents a third variable, adding an extra dimension to the data visualization.


3. 3D Scatter Plot: Introduces a third axis, allowing for the visualization of relationships among three variables simultaneously.

4. Scatter Plot Matrix: A grid of scatter plots showing pairwise relationships among multiple variables.


5. Connected Scatter Plot: Points are connected by lines, often used to show how relationships change over time.


6. Jitter Plot: Adds small random variation to point positions, useful for visualizing overlapping data points.

7. Heat Scatter Plot: Combines elements of a scatter plot and a heat map, using color intensity to represent point density.

History

The development of scatter plots is intertwined with the history of statistics and data visualization:

1833: The concept of plotting data points on a graph to show relationships is attributed to John Frederick W. Herschel.


1880s: Francis Galton introduces the concept of correlation and regression, laying the groundwork for more sophisticated use of scatter plots.


1920s: Scatter diagrams become more widely used in scientific and statistical literature.


1930s: The term "scatter diagram" enters common usage among statisticians.


1950s-1960s: With the advent of computers, scatter plots become easier to generate and more widely used in various fields.


1970s-1980s: Development of interactive computer graphics allows for dynamic scatter plot creation and manipulation.


1990s-2000s: Advanced statistical software packages make sophisticated scatter plot analysis accessible to a broader audience.


2010s-Present: Interactive and web-based scatter plots become common, with real-time data updates and user interaction features.

Examples of Scatter Plots

Scatter plots are used across various disciplines to visualize relationships and patterns:

1. Economics: Economists use scatter plots to visualize the relationship between variables like GDP and life expectancy across countries. For instance, the famous "Gapminder" visualizations by Hans Rosling use animated scatter plots to show how these relationships change over time.


2. Environmental Science: Climate scientists employ scatter plots to examine correlations between CO2 emissions and global temperature changes, helping to visualize the impact of human activities on climate. 


3. Healthcare: In epidemiology, scatter plots are used to visualize the relationship between factors like body mass index (BMI) and blood pressure, aiding in the identification of health risk factors. 


4. Marketing:  Digital marketers use scatter plots to analyze the relationship between ad spend and conversion rates across different campaigns, informing budget allocation decisions.


5. Sports Analytics: In basketball, scatter plots are used to visualize shot accuracy from different court positions, informing both player development and team strategy. 

Tools and Websites

Numerous tools and platforms facilitate the creation and analysis of scatter plots:

1. Julius: Offers intuitive tools to plot and explore relationships between variables, allowing users to visually assess correlations and trends in their data. 


2. Microsoft Excel: Offers basic scatter plot functionality accessible to many users.


3. R (with ggplot2): Widely used in academic and scientific contexts for creating customizable scatter plots. 


4. Python (with libraries like Matplotlib and Seaborn): Popular for data scientists and analysts for creating complex scatter plots.


5. Tableau: Provides robust scatter plot capabilities with interactive features for business intelligence. 


6. D3.js: A JavaScript library for creating interactive and dynamic scatter plots for web applications. 


7. Google Charts: Offers easy-to-use scatter plot tools for web developers. 


8. Plotly: Provides interactive scatter plot creation with both Python and R interfaces. 

In the Workforce

The use of scatter plots has impacted various professions and created new opportunities:

1. Data Scientists: Regularly use scatter plots to explore relationships between variables and communicate findings. 


2. Business Analysts: Employ scatter plots to visualize market trends, customer behavior, and performance metrics. 


3. Financial Analysts: Utilize scatter plots to analyze risk and return relationships in investment portfolios. 


4. Quality Control Specialists: Use scatter plots to monitor and improve manufacturing processes by visualizing relationships between different production variables.


5. Researchers: Across various fields, researchers use scatter plots to visualize experimental results and identify patterns in data. 


6. Marketing Analysts: Leverage scatter plots to understand customer segmentation and the effectiveness of marketing campaigns.


7. Environmental Scientists: Use scatter plots to analyze relationships between various ecological factors and climate variables. 

Frequently Asked Questions

How do you interpret a scatter plot?

Interpretation involves examining the overall pattern of points, identifying any trends (linear, curved), assessing the strength of relationships, and looking for outliers or clusters. The direction, form, and strength of the relationship can all be inferred from a well-constructed scatter plot.

What's the difference between correlation and causation in a scatter plot?

A scatter plot can show correlation (a relationship between variables) but doesn't prove causation. Just because two variables appear related doesn't mean one causes the other – there could be other factors involved.

Can scatter plots be used with categorical data?

While scatter plots are typically used for continuous data, they can be adapted for categorical data using techniques like jittering or creating categorical scales on one or both axes.

How many variables can a scatter plot display?

A basic scatter plot displays two variables, one on each axis. However, variations like bubble charts can incorporate a third variable through point size, and color can be used to represent a fourth variable.

What are some common mistakes in creating or interpreting scatter plots?

Common mistakes include not considering the scale of axes (which can distort the visual relationship), overlooking outliers, assuming correlation implies causation, and not considering other variables that might influence the relationship.

How are scatter plots used in machine learning?

In machine learning, scatter plots are often used in exploratory data analysis to visualize relationships between features, identify clusters for unsupervised learning problems, and visualize the results of dimensionality reduction techniques like PCA.

— Your AI for Analyzing Data & Files

Turn hours of wrestling with data into minutes on Julius.