How to fix ModuleNotFoundError: No module named 'sklearn.cross_validation'

If you use the scikit-learn package in your Python project, you may get the following error:

ModuleNotFoundError: No module named 'sklearn.cross_validation'

This error occurs because the cross_validation module has been renamed to model_selection in the latest scikit-learn package.

This tutorial shows you an example that causes this error and how to fix it.

How this error happens

Suppose you want to split a pandas DataFrame or NumPy array dataset into random train and test subsets.

You usually import the train_test_split function from the cross_validation module as follows:

from sklearn.cross_validation import train_test_split

But the cross_validation module was deprecated and removed in scikit-learn version 0.20, and the train_test_split function is now available from the model_selection module.

To fix this error you only need to change the import statement to:

from sklearn.model_selection import train_test_split

Now you can split a dataset into the training set and test set:

from sklearn.model_selection import train_test_split
import numpy as np

data = np.arange(10)  # [0 1 2 3 4 5 6 7 8 9]

print(data)

train_set, test_set = train_test_split(data, test_size=0.4, random_state=0)

print(train_set)  # [1 6 7 3 0 5]
print(test_set)  # [2 8 4 9]

The train_test_split function in model_selection works the same way as the one in cross_validation. The only difference is there’s a shuffle parameter that you can use to shuffle data before splitting.

You can learn more about it in train_test_split documentation.

Note that this time you can use the train_test_split function without causing the error. Nice! 👍

How to fix ModuleNotFoundError: No module named 'sklearn.cross_validation'

How this error happens

Take your skills to the next level ⚡️

About

Search

Tags