How to fix UnicodeDecodeError: invalid continuation byte

One error that you might encounter when working with Python is:

UnicodeDecodeError: invalid continuation byte

This error occurs when you try to decode a bytes object with an encoding that doesn’t support that character.

This tutorial shows an example that causes this error and how to fix it.

How to reproduce this error

Suppose you have a bytes object in your Python code as follows:

bytes_obj = b"\xe1 b c"

Next, you want to decode the bytes character using the utf-8 encoding like this:

str_obj = bytes_obj.decode('utf-8')

Output:

Traceback (most recent call last):
  File "main.py", line 3, in <module>
    str_obj = bytes_obj.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 
in position 0: invalid continuation byte

You get an error because the character \xe1 in the bytes object is the á character encoded using latin-1 encoding.

How to fix this error

To resolve this error, you need to change the encoding used in the decode() method to latin-1 as follows:

bytes_obj = b"\xe1 b c"

str_obj = bytes_obj.decode('latin-1')

print(str_obj)  # á b c

Note that this time the decode() method runs without any error.

You can also get this error when running other methods such as pandas read_csv() method.

You need to specify the encoding used by the method as follows:

pd.read_csv('example.csv', encoding='latin-1')

The same also works when you use the open() function to work with files:

csv_file = open('example.csv', encoding='latin-1')

# or:
with open('example.csv', encoding='latin-1') as file:

If you only want to read the files without modifying the content, you can use the open() function in rb read binary mode.

Here’s an example when you parse an HTML file using Beautiful Soup:

soup = BeautifulSoup(open('index.html', 'rb'), 'html.parser') 

print(soup.get_text())

When you decode the bytes object, you need to use the encoding that supports the object.

If you don’t want to encode the object when opening a file, you need to specify the open mode as rb or wb to read and write in binary mode.

I hope this tutorial helps. See you in other tutorials! 👍

How to fix UnicodeDecodeError: invalid continuation byte

How to reproduce this error

How to fix this error

Take your skills to the next level ⚡️

About

Search

Tags