When working with pandas and .xls
files, you might encounter the following error:
xlrd.biffh.XLRDError: Excel xlsx file; not supported
This error occurs when you try to read a .xlsx
or .xlsm
file using pandas. The latest version of pandas has solved this error.
Let’s see an example that causes this error and how you can fix it in practice.
How to reproduce this error
Suppose you used the read_excel()
method from pandas to read an excel file as follows:
import pandas as pd
df = pd.read_excel('data.xlsx')
print(df)
Output:
Traceback (most recent call last):
File "/Users/nsebhastian/Desktop/DEV/python/main.py", line 3, in <module>
df = pd.read_excel('data.xlsx')
^^^^^^^^^^^^^^^^^^^^^^^^^^
File ...
raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
xlrd.biffh.XLRDError: Excel xlsx file; not supported
This error occurs because pandas used the xlrd
library to read excel files.
The xlrd
library stopped supporting anything other than the .xls
format in its latest version, so this limitation impacts the ability of pandas to read .xlsx
files.
How to fix this error
To resolve this error, you need to use the latest pandas version and install the openpyxl
library, which can be used to read Excel xlsx/xlsm/xltx/xltm formats.
First, upgrade pandas using one of the following commands:
# For pip or pip3:
pip install pandas --upgrade
pip3 install pandas --upgrade
# If you don't have pip in PATH:
python -m pip install pandas --upgrade
python3 -m pip install pandas --upgrade
# For Jupyter Notebook
!pip install pandas --upgrade
Next, install the openpyxl
library using pip
as well:
# For pip or pip3:
pip install openpyxl
pip3 install openpyxl
# If you don't have pip in PATH:
python -m pip install openpyxl
python3 -m pip install openpyxl
# For Jupyter Notebook
!pip install openpyxl
Once installed, edit the code that calls the read_excel()
method by adding the engine
argument as follows:
import pandas as pd
df = pd.read_excel('data.xlsx', engine='openpyxl')
print(df)
Now pandas will be able to read the .xlsx
file without any errors.
Note that the openpyxl
library doesn’t support reading the old .xls
format.
To read a .xls
file, you need to change the engine back to xlrd
as follows:
import pandas as pd
df = pd.read_excel('data.xls', engine='xlrd')
print(df)
You can also choose to downgrade xlrd
to version 1.2 if you don’t want to use openpyxl
, but I don’t recommend it because an older xlrd
version isn’t compatible with the latest pandas version.
It’s better to use openpyxl
for newer Excel file formats and keep xlrd
for the old .xls
format.
I hope this tutorial is helpful. Happy coding and see you in other tutorials! 👋