When you’re working with pandas DataFrame, you might get the following error:
ValueError: Can only compare identically-labeled DataFrame objects
This error occurs when you try to compare two DataFrame objects that have different row or column labels.
This tutorial will show you an example that causes this error and how to fix it.
How to reproduce this error
Imagine you have a DataFrame that has two columns called gold
and silver
as follows:
import pandas as pd
df1 = pd.DataFrame({'gold': [10, 11, 12, 13],
'silver': [4, 5, 6, 7]},
index=[0, 1, 2, 3])
The df1
object use numbers as the index. Now suppose you create another DataFrame object that uses letters for the index as follows:
df2 = pd.DataFrame({'gold': [10, 11, 12, 13],
'silver': [4, 5, 6, 8]},
index=['a', 'b', 'c', 'd'])
Then, you try to compare the two DataFrames using the equality comparison ==
operator as follows:
res = df1 == df2
The output is an error:
Traceback (most recent call last):
File "main.py", line 14, in <module>
df1 == df2
ValueError: Can only compare identically-labeled DataFrame objects
This error occurs because the two DataFrame objects have different row index arguments, so you can’t compare them using the ==
operator.
You can verify this by printing the DataFrames:
print(df1)
print(df2)
gold silver
0 10 4
1 11 5
2 12 6
3 13 7
gold silver
a 10 4
b 11 5
c 12 6
d 13 8
As you can see, df1
uses 0-3
for the row index, while df2
uses a-d
for the row index. This causes an error.
How to fix this error
To fix this error, you need to call the reset_index()
method to reset the index of both DataFrame objects.
Pass the drop=True
argument to the method to reset the index back to the integer index as follows:
import pandas as pd
df1 = pd.DataFrame({'gold': [10, 11, 12, 13],
'silver': [4, 5, 6, 7]},
index=[0, 1, 2, 3])
df2 = pd.DataFrame({'gold': [10, 11, 12, 13],
'silver': [4, 5, 6, 8]},
index=['a', 'b', 'c', 'd'])
res = df1.reset_index(drop=True) == df2.reset_index(drop=True)
print(res)
Output:
gold silver
0 True True
1 True True
2 True True
3 True False
As you can see, by resetting the index back to sequential integers, you can compare the DataFrame rows without receiving the error.
Alternative: Use the equals() function
The ==
operator compares each row and returns a DataFrame with boolean values. If you want to get a single boolean value that determines whether the two DataFrames are equal, you can use the equals()
function.
Call the equals()
function from the first DataFrame like this:
res = df1.equals(df2)
print(res) # False
The equals()
function returns True
only when the two objects contain the same elements, when even one element isn’t equal, it returns False
.
You can also call the reset_index()
method on the objects to ignore the row index differences as shown below:
import pandas as pd
# First, make the DataFrames identical
df1 = pd.DataFrame({'a': [10, 11, 12, 13],
'b': [4, 5, 6, 7]},
index=[0, 1, 2, 3])
df2 = pd.DataFrame({'a': [10, 11, 12, 13],
'b': [4, 5, 6, 7]},
index=['a', 'b', 'c', 'd'])
res = df1.equals(df2)
print(res) # False
res = df1.reset_index(drop=True).equals(df2.reset_index(drop=True))
print(res) # True
As you can see, the equals()
function returns False
when the values are equal but the index
argument is not. But when you reset the index, the function returns True
.
The equality comparison lets you know which value is not equal in your DataFrame, while the equals()
function simply returns a single True
or False
value.
I hope this tutorial is helpful. Until next time! 🙌