How to fix xlrd.biffh.XLRDError: Excel xlsx file; not supported

When working with pandas and .xls files, you might encounter the following error:

xlrd.biffh.XLRDError: Excel xlsx file; not supported

This error occurs when you try to read a .xlsx or .xlsm file using pandas. The latest version of pandas has solved this error.

Let’s see an example that causes this error and how you can fix it in practice.

How to reproduce this error

Suppose you used the read_excel() method from pandas to read an excel file as follows:

import pandas as pd

df = pd.read_excel('data.xlsx')

print(df)

Output:

Traceback (most recent call last):
  File "/Users/nsebhastian/Desktop/DEV/python/main.py", line 3, in <module>
    df = pd.read_excel('data.xlsx')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ...
    raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
xlrd.biffh.XLRDError: Excel xlsx file; not supported

This error occurs because pandas used the xlrd library to read excel files.

The xlrd library stopped supporting anything other than the .xls format in its latest version, so this limitation impacts the ability of pandas to read .xlsx files.

How to fix this error

To resolve this error, you need to use the latest pandas version and install the openpyxl library, which can be used to read Excel xlsx/xlsm/xltx/xltm formats.

First, upgrade pandas using one of the following commands:

# For pip or pip3:
pip install pandas --upgrade
pip3 install pandas --upgrade

# If you don't have pip in PATH:
python -m pip install pandas --upgrade
python3 -m pip install pandas --upgrade

# For Jupyter Notebook
!pip install pandas --upgrade

Next, install the openpyxl library using pip as well:

# For pip or pip3:
pip install openpyxl
pip3 install openpyxl

# If you don't have pip in PATH:
python -m pip install openpyxl
python3 -m pip install openpyxl

# For Jupyter Notebook
!pip install openpyxl

Once installed, edit the code that calls the read_excel() method by adding the engine argument as follows:

import pandas as pd

df = pd.read_excel('data.xlsx', engine='openpyxl')

print(df)

Now pandas will be able to read the .xlsx file without any errors.

Note that the openpyxl library doesn’t support reading the old .xls format.

To read a .xls file, you need to change the engine back to xlrd as follows:

import pandas as pd

df = pd.read_excel('data.xls', engine='xlrd')

print(df)

You can also choose to downgrade xlrd to version 1.2 if you don’t want to use openpyxl, but I don’t recommend it because an older xlrd version isn’t compatible with the latest pandas version.

It’s better to use openpyxl for newer Excel file formats and keep xlrd for the old .xls format.

I hope this tutorial is helpful. Happy coding and see you in other tutorials! 👋

Take your skills to the next level ⚡️

I'm sending out an occasional email with the latest tutorials on programming, web development, and statistics. Drop your email in the box below and I'll send new stuff straight into your inbox!

No spam. Unsubscribe anytime.