A scatterplot is used to visualize a relationship between datasets, enabling you to interpret whether there is a trend in your data or not. A scatterplot is also known as a scatter diagram.
Each observation in a scatterplot has 2 coordinates: the independent variables displayed on the x-axis
of the graph, and the dependent variables displayed on the y-axis
.
Depending on the pattern that shows up in the plot, you might be able to determine whether a relationship or correlation exists between the two variables.
If the data points make a straight line when plotted, then the relationship between the variable are strong. Consider the examples below:
We can interpret the graph by looking at the trends from left to right. On the left diagram, we can say that the plot has a perfect positive correlation because the value of dependent variable y
goes up as the value of independent variable x
increases.
On the other hand, the right diagram has a perfect negative correlation because the value of dependent variable y
goes down as the x
value increases.
But these examples rarely happen with real datasets. You might find a strong or weak correlation, but never perfect as shown below:
When the data doesnโt resemble any pattern at all, then thereโs no correlation between the variables.
Scatterplots are commonly used in data analysis and visualization to display the relationship between variables in the dataset.
They are particularly useful for identifying patterns and trends in the dataset. The visual insights allow you to easily see outliers, clusters, and the data distribution.
I hope this tutorial is useful. See you in other tutorials!