How to fix ModuleNotFoundError: No module named 'sklearn.cross_validation'

If you use the scikit-learn package in your Python project, you may get the following error:

ModuleNotFoundError: No module named 'sklearn.cross_validation'

This error occurs because the cross_validation module has been renamed to model_selection in the latest scikit-learn package.

This tutorial shows you an example that causes this error and how to fix it.

How this error happens

Suppose you want to split a pandas DataFrame or NumPy array dataset into random train and test subsets.

You usually import the train_test_split function from the cross_validation module as follows:

from sklearn.cross_validation import train_test_split

But the cross_validation module was deprecated and removed in scikit-learn version 0.20, and the train_test_split function is now available from the model_selection module.

To fix this error you only need to change the import statement to:

from sklearn.model_selection import train_test_split

Now you can split a dataset into the training set and test set:

from sklearn.model_selection import train_test_split
import numpy as np

data = np.arange(10)  # [0 1 2 3 4 5 6 7 8 9]

print(data)

train_set, test_set = train_test_split(data, test_size=0.4, random_state=0)

print(train_set)  # [1 6 7 3 0 5]
print(test_set)  # [2 8 4 9]

The train_test_split function in model_selection works the same way as the one in cross_validation. The only difference is there’s a shuffle parameter that you can use to shuffle data before splitting.

You can learn more about it in train_test_split documentation.

Note that this time you can use the train_test_split function without causing the error. Nice! 👍

Take your skills to the next level ⚡️

I'm sending out an occasional email with the latest tutorials on programming, web development, and statistics. Drop your email in the box below and I'll send new stuff straight into your inbox!

No spam. Unsubscribe anytime.