If you use the scikit-learn package in your Python project, you may get the following error:
ModuleNotFoundError: No module named 'sklearn.cross_validation'
This error occurs because the cross_validation
module has been renamed to model_selection
in the latest scikit-learn package.
This tutorial shows you an example that causes this error and how to fix it.
How this error happens
Suppose you want to split a pandas DataFrame or NumPy array dataset into random train and test subsets.
You usually import the train_test_split
function from the cross_validation
module as follows:
from sklearn.cross_validation import train_test_split
But the cross_validation
module was deprecated and removed in scikit-learn version 0.20, and the train_test_split
function is now available from the model_selection
module.
To fix this error you only need to change the import
statement to:
from sklearn.model_selection import train_test_split
Now you can split a dataset into the training set and test set:
from sklearn.model_selection import train_test_split
import numpy as np
data = np.arange(10) # [0 1 2 3 4 5 6 7 8 9]
print(data)
train_set, test_set = train_test_split(data, test_size=0.4, random_state=0)
print(train_set) # [1 6 7 3 0 5]
print(test_set) # [2 8 4 9]
The train_test_split
function in model_selection
works the same way as the one in cross_validation
. The only difference is there’s a shuffle
parameter that you can use to shuffle data before splitting.
You can learn more about it in train_test_split documentation.
Note that this time you can use the train_test_split
function without causing the error. Nice! 👍