One error that you might see when working with pandas DataFrame is:
TypeError: cannot convert the series to <class 'int'>
This error usually occurs when you try to convert a series object into integer data type.
The following tutorial shows a detailed example that causes this error and how to fix it.
How this error can happen
A series object is a one dimensional array that’s commonly used by pandas for its DataFrame columns.
Suppose you’re working with the pandas library and created a DataFrame as shown below:
import pandas as pd
df = pd.DataFrame({"distance": [3.6, 18.3, 21.5, 25.2]})
print(df)
Output:
distance
0 3.6
1 18.3
2 21.5
3 25.2
This DataFrame has one column with float values. Now let’s say you want to convert those float values into int
.
You then created a new DataFrame column and called the int()
function to convert the distance
column values like this:
import pandas as pd
df = pd.DataFrame({"distance": [3.6, 18.3, 21.5, 25.2]})
df['int_distance'] = int(df['distance'])
Output:
Traceback (most recent call last):
File "main.py", line 5, in <module>
df['int_distance'] = int(df['distance'])
File "/opt/homebrew/lib/python3.10/site-packages/pandas/core/series.py", line 185, in wrapper
raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'int'>
Oops! The int()
function in Python only knows how to convert a single value into its integer equivalent.
The error happens because we’re asking the function to convert a series, which has multiple float values.
It doesn’t matter whether you have a series of floats, strings, or any other type. The int()
function can’t work with a series.
The same happens when you call other functions, such as the datetime.fromtimestamp()
function.
Suppose you have a series of timestamps that you want to convert to human-readable format:
import pandas as pd
from datetime import datetime
df = pd.DataFrame(
{"created_ts": [1674195300, 1674196140, 1674196620, 1674195420]}
)
df["date"] = datetime.fromtimestamp(df["created_ts"])
The code above triggers the same error because datetime.fromtimestamp()
requires you to pass a timestamp directly like this:
datetime.fromtimestamp(1674195300)
To fix this error, you need to use a function that can convert a series. Let’s see how to do it next.
How to fix this error
There are three easy ways you can fix this error:
- Use the
astype()
function - Use the
apply()
function - Use the list comprehension syntax
pandas come with the astype()
function that can be used to convert a series into other types.
You can use this function as follows:
import pandas as pd
df = pd.DataFrame({"distance": [3.6, 18.3, 21.5, 25.2]})
df['integer_distance'] = df['distance'].astype(int)
print(df)
And you’ll get the following output:
distance integer_distance
0 3.6 3
1 18.3 18
2 21.5 21
3 25.2 25
By calling the .astype(int)
function, you are able to convert the distance
column values into integers without any error.
Next, you can also use the pandas apply()
function to fix this error.
Here’s the code:
import pandas as pd
df = pd.DataFrame({"distance": [3.6, 18.3, 21.5, 25.2]})
df['integer_distance'] = df['distance'].apply(int)
print(df)
The apply()
function iterates over the column from which you call this function and apply the function you passed as its argument. It then returns a new series from the process.
By passing the function int
to apply()
, pandas will call int()
to each value in the series.
You can use apply()
to convert a series of timestamps too:
import pandas as pd
from datetime import datetime
df = pd.DataFrame(
{"created_ts": [1674195300, 1674196140, 1674196620, 1674195420]}
)
df['date'] = df['created_ts'].apply(datetime.fromtimestamp)
print(df)
Output:
created_ts date
0 1674195300 2023-01-20 13:15:00
1 1674196140 2023-01-20 13:29:00
2 1674196620 2023-01-20 13:37:00
3 1674195420 2023-01-20 13:17:00
The apply()
function here works like a list comprehension, which you can also use to convert values in a series one by one as follows:
import pandas as pd
df = pd.DataFrame({"distance": [3.6, 18.3, 21.5, 25.2]})
df['integer_distance'] = [int(v) for v in df['distance']]
The list comprehension syntax above will call the int()
function for each value in the distance
column, similar to the apply()
function.
Now you know how to convert a series into integer type. I’ve also written other tutorials that deal with errors when working with pandas. You might want to check them out here:
I hope this tutorial is helpful! See you in other articles! 👍