One error that you might encounter when working with Python is:
UnicodeDecodeError: invalid continuation byte
This error occurs when you try to decode a bytes object with an encoding that doesn’t support that character.
This tutorial shows an example that causes this error and how to fix it.
How to reproduce this error
Suppose you have a bytes object in your Python code as follows:
bytes_obj = b"\xe1 b c"
Next, you want to decode the bytes character using the utf-8
encoding like this:
str_obj = bytes_obj.decode('utf-8')
Output:
Traceback (most recent call last):
File "main.py", line 3, in <module>
str_obj = bytes_obj.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1
in position 0: invalid continuation byte
You get an error because the character \xe1
in the bytes object is the á
character encoded using latin-1
encoding.
How to fix this error
To resolve this error, you need to change the encoding used in the decode()
method to latin-1
as follows:
bytes_obj = b"\xe1 b c"
str_obj = bytes_obj.decode('latin-1')
print(str_obj) # á b c
Note that this time the decode()
method runs without any error.
You can also get this error when running other methods such as pandas read_csv()
method.
You need to specify the encoding used by the method as follows:
pd.read_csv('example.csv', encoding='latin-1')
The same also works when you use the open()
function to work with files:
csv_file = open('example.csv', encoding='latin-1')
# or:
with open('example.csv', encoding='latin-1') as file:
If you only want to read the files without modifying the content, you can use the open()
function in rb
read binary mode.
Here’s an example when you parse an HTML file using Beautiful Soup:
soup = BeautifulSoup(open('index.html', 'rb'), 'html.parser')
print(soup.get_text())
When you decode the bytes object, you need to use the encoding that supports the object.
If you don’t want to encode the object when opening a file, you need to specify the open mode as rb
or wb
to read and write in binary mode.
I hope this tutorial helps. See you in other tutorials! 👍