When you’re using pandas to read a JSON file, you might get an error as follows:
ValueError: Trailing data
This error occurs when pandas can’t read the JSON objects in your file because it has an invalid format.
There are two possible scenarios that may trigger this error:
- You have more than one top-level JSON object
- There are newline characters
\n
inside your data
This tutorial will show you examples that cause this error and how to fix it.
1. You have more than one top-level JSON object in your file
Suppose you have a file named data.json
with the following content:
{ "name": "Nathan", "about": "29 years old. A programmer." }
{ "name": "John", "about": "32 years old. A designer." }
{ "name": "Susan", "about": "25 years old. A writer." }
Next, you tried to use pandas to read the data in the JSON file with the code below:
import pandas as pd
data = pd.read_json('data.json')
print(data)
When you run the code, you get the following error:
Traceback (most recent call last):
File ...
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Trailing data
This error occurs because you have three JSON objects in the same level. JSON can only have one top-level value, and that value can be an object or a list (array)
Trailing data means you still have data to read when pandas expected the end of the file, so this error occurs.
If you have access to the JSON file, then you need to combine the objects in one list as follows:
[
{ "name": "Nathan", "about": "29 years old. A programmer." },
{ "name": "John", "about": "32 years old. A designer." },
{ "name": "Susan", "about": "25 years old. A writer." }
]
The three objects are now combined in one list, separated by commas.
Run the same code again, and you’ll get this result:
name about
0 Nathan 29 years old. A programmer.
1 John 32 years old. A designer.
2 Susan 25 years old. A writer.
If you don’t have access to the file, then you can resolve this error by adding the lines=True
argument when calling the read_json()
method as follows:
import pandas as pd
data = pd.read_json('data.json', lines=True)
print(data)
The lines
argument is used to let read_json()
reads the file as a JSON object per line.
When you set this argument to True
, pandas will be able to read each line in the file without having to combine them as one list.
2. There are newline characters \n
inside your data
Sometimes, you might have newline characters \n
in your data as shown below:
{ "name": "Nathan", "about": "29 years old.\n A programmer." }
{ "name": "John", "about": "32 years old.\n A designer." }
{ "name": "Susan", "about": "25 years old.\n A writer." }
Suppose you read the file using pandas read_json()
method as follows:
import pandas as pd
data = pd.read_json('data.json')
print(data)
You’ll get the same error:
Traceback (most recent call last):
File ...
ValueError: Trailing data
Besides having more than one top-level value, the JSON file above has \n
characters which will cause the trailing data error.
To resolve this error, you need to specify the lines=True
argument to the read_json()
method as follows:
import pandas as pd
data = pd.read_json('data.json', lines=True)
print(data)
This time, you won’t receive an error. The data will be printed as follows:
name about
0 Nathan 29 years old.\n A programmer.
1 John 32 years old.\n A designer.
2 Susan 25 years old.\n A writer.
If you don’t want the \n
characters to appear in the about
column data, you can call the str.replace()
method on the column to remove the characters as shown below:
import pandas as pd
data = pd.read_json('data.json', lines=True)
# Replace \n with empty string
data['about'] = data['about'].str.replace('\n', '')
print(data)
Output:
name about
0 Nathan 29 years old. A programmer.
1 John 32 years old. A designer.
2 Susan 25 years old. A writer.
As you can see, this time the \n
characters are removed from the about
column.
Conclusion
The ValueError: Trailing data
occurs in Python when you use the pandas library to read a JSON file that has an invalid format.
To resolve this error, you can adjust the JSON file to include only one top-level value, or you can specify the line=True
argument to the read_json()
method.
I hope this tutorial is helpful. See you again in other tutorials! 👋