How to fix ValueError: Trailing data in Python

When you’re using pandas to read a JSON file, you might get an error as follows:

ValueError: Trailing data

This error occurs when pandas can’t read the JSON objects in your file because it has an invalid format.

There are two possible scenarios that may trigger this error:

  1. You have more than one top-level JSON object
  2. There are newline characters \n inside your data

This tutorial will show you examples that cause this error and how to fix it.

1. You have more than one top-level JSON object in your file

Suppose you have a file named data.json with the following content:

{ "name": "Nathan", "about": "29 years old. A programmer." }
{ "name": "John", "about": "32 years old. A designer." }
{ "name": "Susan", "about": "25 years old. A writer." }

Next, you tried to use pandas to read the data in the JSON file with the code below:

import pandas as pd

data = pd.read_json('data.json')

print(data)

When you run the code, you get the following error:

Traceback (most recent call last):
  File ...
    loads(json, precise_float=self.precise_float), dtype=None
ValueError: Trailing data

This error occurs because you have three JSON objects in the same level. JSON can only have one top-level value, and that value can be an object or a list (array)

Trailing data means you still have data to read when pandas expected the end of the file, so this error occurs.

If you have access to the JSON file, then you need to combine the objects in one list as follows:

[
  { "name": "Nathan", "about": "29 years old. A programmer." },
  { "name": "John", "about": "32 years old. A designer." },
  { "name": "Susan", "about": "25 years old. A writer." }
]

The three objects are now combined in one list, separated by commas.

Run the same code again, and you’ll get this result:

     name                        about
0  Nathan  29 years old. A programmer.
1    John    32 years old. A designer.
2   Susan      25 years old. A writer.

If you don’t have access to the file, then you can resolve this error by adding the lines=True argument when calling the read_json() method as follows:

import pandas as pd

data = pd.read_json('data.json', lines=True)

print(data)

The lines argument is used to let read_json() reads the file as a JSON object per line.

When you set this argument to True, pandas will be able to read each line in the file without having to combine them as one list.

2. There are newline characters \n inside your data

Sometimes, you might have newline characters \n in your data as shown below:

{ "name": "Nathan", "about": "29 years old.\n A programmer." }
{ "name": "John", "about": "32 years old.\n A designer." }
{ "name": "Susan", "about": "25 years old.\n A writer." }

Suppose you read the file using pandas read_json() method as follows:

import pandas as pd

data = pd.read_json('data.json')

print(data)

You’ll get the same error:

Traceback (most recent call last):
  File ...
ValueError: Trailing data

Besides having more than one top-level value, the JSON file above has \n characters which will cause the trailing data error.

To resolve this error, you need to specify the lines=True argument to the read_json() method as follows:

import pandas as pd

data = pd.read_json('data.json', lines=True)

print(data)

This time, you won’t receive an error. The data will be printed as follows:

     name                          about
0  Nathan  29 years old.\n A programmer.
1    John    32 years old.\n A designer.
2   Susan      25 years old.\n A writer.

If you don’t want the \n characters to appear in the about column data, you can call the str.replace() method on the column to remove the characters as shown below:

import pandas as pd

data = pd.read_json('data.json', lines=True)

# Replace \n with empty string
data['about'] = data['about'].str.replace('\n', '')

print(data)

Output:

     name                        about
0  Nathan  29 years old. A programmer.
1    John    32 years old. A designer.
2   Susan      25 years old. A writer.

As you can see, this time the \n characters are removed from the about column.

Conclusion

The ValueError: Trailing data occurs in Python when you use the pandas library to read a JSON file that has an invalid format.

To resolve this error, you can adjust the JSON file to include only one top-level value, or you can specify the line=True argument to the read_json() method.

I hope this tutorial is helpful. See you again in other tutorials! 👋

Take your skills to the next level ⚡️

I'm sending out an occasional email with the latest tutorials on programming, web development, and statistics. Drop your email in the box below and I'll send new stuff straight into your inbox!

No spam. Unsubscribe anytime.