1. Import of related databases and preliminary preparation for binary tree drawing
Calling the make_blobs function is mainly to generate classification or clustering data sets n_features indicates how many feature values each sample has n_sample represents the number of samples Centers are the number of cluster center points, which can be understood as the number of label types. random_state is a random seed that can fix the generated data cluster_std sets the standard deviation of each category (default is 1) shuffle random arrangement (shuffle) #Import dataset generator from sklearn.datasets import make_blobs help(make_blobs) Help on function make_blobs in module sklearn.datasets._samples_generator: make_blobs(n_samples=100, n_features=2, *, centers=None, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None, return_centers=False) Generate isotropic Gaussian blobs for clustering. Read more in the :ref:`User Guide <sample_generators>`. Parameters ---------- n_samples : int or array-like, default=100 If int, it is the total number of points equally divided among clusters. If array-like, each element of the sequence indicates the number of samples per cluster. .. versionchanged:: v0.20 one can now pass an array-like to the ``n_samples`` parameter n_features : int, default=2 The number of features for each sample. centers : int or ndarray of shape (n_centers, n_features), default=None The number of centers to generate, or the fixed center locations. If n_samples is an int and centers is None, 3 centers are generated. If n_samples is array-like, centers must be either None or an array of length equal to the length of n_samples. cluster_std : float or array-like of float, default=1.0 The standard deviation of the clusters. center_box : tuple of float (min, max), default=(-10.0, 10.0) The bounding box for each cluster center when centers are generated at random. shuffle : bool, default=True Shuffle the samples. random_state : int, RandomState instance or None, default=None Determines random number generation for dataset creation. Pass an int for reproducible output across multiple function calls. See :term:`Glossary <random_state>`. return_centers : bool, default=False If True, then return the centers of each cluster .. versionadded:: 0.23 Returns ------- X : ndarray of shape (n_samples, n_features) The generated samples. y : ndarray of shape (n_samples,) The integer labels for cluster membership of each sample. centers : ndarray of shape (n_centers, n_features) The centers of each cluster. Only returned if ``return_centers=True``. Examples -------- >>> from sklearn.datasets import make_blobs >>> X, y = make_blobs(n_samples=10, centers=3, n_features=2, ...random_state=0) >>> print(X.shape) (10, 2) >>> y array([0, 0, 1, 0, 2, 2, 2, 1, 1, 0]) >>> X, y = make_blobs(n_samples=[3, 3, 4], centers=None, n_features=2, ...random_state=0) >>> print(X.shape) (10, 2) >>> y array([0, 1, 2, 0, 2, 2, 2, 1, 1, 0]) See Also -------- make_classification : A more intricate variant. data = make_blobs(n_samples = 200, centers = 2, random_state = 8) print(data) (array([[ 6.75445054, 9.74531933], [6.80526026, -0.2909292], [7.07978644, 7.81427747], [6.87472003, -0.16069949], [8.06164078, 8.43736968], [7.4934131, 11.00892356], [4.69777002, 0.59687317], [9.19642422, 11.57536954], [8.80996213, 11.9021701], [7.5952749, 1.32739544], [8.20330317, 1.27929111], [8.59258191, -0.29022607], [6.89228905, 8.60634293], [8.00405631, 10.53695374], [8.14715032, 2.09399376], [7.06363179, -0.57743891], [6.34526126, 8.70677779], [5.28435774, 10.16972385], [6.62257531, 2.04423066], [7.40314915, 10.42342437], [7.27423265, 9.18459991], [8.77188508, 0.768341], [6.39995999, 0.07580004], [7.44636985, 11.43674954], [7.74488453, 0.14409178], [9.10088858, 9.14807411], [8.10044749, 0.7596783], [8.73747674, 2.0086222], [6.51876894, -1.36881715], [7.16251356, 9.74878714], [6.57119411, -0.74277359], [7.1354011, -0.63951267], [7.31294296, 9.92166331], [7.52733204, 0.2744698], [6.0160163, 0.53637761], [6.73117031, 1.20886838], [6.11962018, 0.21527805], [7.88579276, 0.78743005], [7.32112244, 0.78510422], [7.62051584, 9.37144814], [6.96767867, 8.9622523], [8.51730001, -0.42711053], [7.92672195, 0.44823051], [5.52161775, 7.98446372], [6.93568163, 0.50274121], [7.89765814, 8.21954764], [7.40292703, 9.16217702], [8.28827095, 10.71730803], [7.33912656, -0.07533921], [5.27801757, 8.93474119], [5.57550594, 0.4274511], [8.67425268, -0.37860274], [7.55303352, 11.85706105], [6.84661976, -0.85945209], [6.26977193, 2.11033394], [7.09962807, 0.5655205], [5.5987887, 7.59170022], [8.0060449, 0.80933758], [6.85769503, 10.30105929], [6.19399963, 8.19786954], [8.68173394, 0.54980379], [5.82259795, 8.88727231], [5.30528133, 0.29441074], [6.89703841, 7.98081009], [5.9389756, 1.19214956], [7.13760133, 9.84345464], [7.51718983, 1.31532401], [8.08034605, 10.02847377], [6.89078889, 10.61298902], [6.95802459, 9.19924611], [8.91111219, 9.14933265], [7.57818277, 9.58629233], [6.24007751, 0.55847799], [7.79924692, 10.59576952], [7.49985237, 9.55274284], [9.94109903, 9.22395667], [7.07232613, 1.26533062], [7.50126258, 0.62517001], [6.63110319, 2.65308097], [6.6060513, 3.19799895], [8.81545663, 8.76386046], [6.5688005, 0.09522898], [9.15668309, 9.59459888], [7.45637594, 0.24440634], [7.29548244, -0.22293119], [8.20316159, 12.01375618], [6.97321804, 2.576281], [6.42049196, 0.26683712], [7.40783871, 6.93633083], [6.54464509, 0.89987351], [7.58423725, 10.70124388], [8.80002143, 8.54323521], [7.1847723, 2.22950427], [7.80361128, 9.74561264], [7.96481592, 8.03914659], [6.6571269, 7.72756233], [7.29433984, 9.79486468], [7.237824, 1.70291874], [8.37153676, 0.98810496], [6.49932355, 0.24955722], [9.02255525, 10.06777901], [7.61227907, 9.4463627], [8.89464606, 10.29806397], [7.01747287, -1.22016798], [8.10434971, 1.83659293], [7.68373899, 1.5632695], [9.43042008, 0.68726533], [ 6.26211747, 1.577057 ], [9.59017028, 0.58441955], [7.82182216, 0.52633087], [7.6025272, 8.98962387], [8.48011698, 0.69122126], [7.63890536, -0.06731493], [5.84965451, 0.72241791], [7.46996922, 8.44935323], [6.8117005, 10.8840413], [8.67502392, 0.37561206], [8.12519495, 1.67159478], [5.07337492, 10.52482973], [7.48665378, 0.21345453], [8.11950967, 0.56120493], [6.15895483, 8.70208685], [7.94310647, 8.20622208], [7.95311372, 8.36897664], [4.96938735, 1.32531048], [8.8583269, -0.34648253], [10.01367527, 10.52089453], [8.99334153, 9.7313491], [8.22871505, 1.23014656], [6.19407512, -0.03183561], [7.26697254, 9.87045836], [7.94970781, -0.37340645], [5.62803952, 9.77585443], [8.50049461, 9.12147855], [7.31054144, 0.39102866], [7.49814373, 9.29677019], [8.32245091, 9.67819196], [8.32813617, 9.14002426], [7.56475962, 11.24762868], [7.92129785, 0.78018447], [8.00236864, 10.1691733], [4.33366829, 10.51034676], [6.02937898, 10.31974057], [6.88953097, 0.80526874], [7.51239046, 2.06597042], [9.17061801, 10.37690696], [7.63027116, 8.69797933], [8.35312192, 0.20325714], [8.72578696, 10.34691678], [5.44099009, 1.59585563], [7.56093115, -0.51702689], [6.02376341, -0.52025947], [7.15013321, 9.52893935], [7.56833386, 9.32443309], [7.09022949, 8.57919798], [5.94356564, 0.6092466], [6.25817082, 9.79505477], [5.94205586, 10.50768333], [7.82510107, 8.41865266], [5.88994248, 2.1198068], [6.40269472, 0.08495368], [7.64534862, -1.89105765], [6.8830708, 1.38045511], [7.24044576, 1.07171623], [9.4035308, 8.09592099], [6.55819206, 8.84793239], [6.58341965, 8.42678679], [7.83939881, -0.10906103], [7.22095192, 8.06544414], [7.8440213, 10.29060403], [7.39634594, 8.90196559], [9.10772988, -0.06937041], [6.93540782, 1.74268311], [7.9465776, -0.37622421], [7.92430026, 0.10451121], [6.79156708, 0.47231026], [6.28516091, 11.28717687], [7.54257819, 7.02403019], [7.40565933, 8.8292448], [7.51463404, 10.14107588], [6.40863862, 0.09433704], [6.5342397, 9.45532341], [5.17209648, 11.78064756], [5.49953213, 9.04384494], [9.86936252, 0.76402347], [7.84725158, -0.25808463], [8.14330144, 1.05961829], [7.28724996, 7.620998], [6.0888764, -0.01613322], [7.59635095, 8.0197955], [6.71388804, 1.38741885], [7.3307687, 0.97105895], [8.18240421, 8.16999978], [8.53178848, 1.68305022], [6.91511696, 8.64812384], [7.82944816, 9.62627158], [6.09382282, 9.38044447], [7.24211001, 7.48506871], [8.2634157, 10.34723435], [ 8.39800148, 2.8397151 ]]), array([0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0 , 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1])) X, y = data #Separate independent variables and dependent variables import matplotlib.pyplot as plt %matplotlib inline plt.scatter(X[:, 0], X[:, 1], c = y, cmap = plt.cm.spring, edgecolors = 'k') <matplotlib.collections.PathCollection at 0xaf15f70> #Import iris data set from sklearn.datasets import load_iris iris = load_iris() #Import boston data set from sklearn.datasets import load_boston boston = load_boston() class sklearn.preprocessing.MinMaxScaler(feature_range(0,1),copy = True) Scale data to a specified range claas sklearn.preprocessing.MaxAbsScaler(copy = True) Scale the maximum value of the data to 1 #Transform boston data to (10,100) from sklearn.preprocessing import MinMaxScaler mms = MinMaxScaler(feature_range=(10,100))#instantiation mms.fit(boston.data)#Just calculate the mean and standard deviation (preparatory work) MinMaxScaler(feature_range=(10, 100)) boston_mms = mms.transform(boston.data) mms2 = MinMaxScaler(feature_range=(10,100), copy = False) mms2.fit_transform(boston.data) array([[ 10. , 26.2 , 16.10337243, ..., 35.85106383, 100. , 18.07119205], [10.02123303, 10., 31.80718475, ..., 59.78723404, 100. , 28.40231788], [10.0212128, 10., 31.80718475, ..., 59.78723404, 99.07635282, 15.71192053], ..., [10.05507032, 10., 47.84090909, ..., 90.42553191, 100. , 19.7102649 ], [10.10446569, 10., 47.84090909, ..., 90.42553191, 99.21705583, 21.79635762], [10.04156575, 10., 47.84090909, ..., 90.42553191, 100. , 25.27317881]]) from sklearn.preprocessing import MaxAbsScaler mas = MaxAbsScaler()#Default is 0-1 mas.fit_transform(boston.data)#The original data has not changed, copy = True array([[0.1 , 0.262 , 0.16103372, ..., 0.35851064, 1. , 0.18071192], [0.10021233, 0.1, 0.31807185, ..., 0.59787234, 1., 0.28402318], [0.10021213, 0.1, 0.31807185, ..., 0.59787234, 0.99076353, 0.15711921], ..., [0.1005507, 0.1, 0.47840909, ..., 0.90425532, 1., 0.19710265], [0.10104466, 0.1, 0.47840909, ..., 0.90425532, 0.99217056, 0.21796358], [0.10041566, 0.1, 0.47840909, ..., 0.90425532, 1., 0.25273179]]) Normalization of data -- vector unitization sklearn.preprocessing.normalize( X, axis = 1, copy = True norm = 'l2' : 'l1', 'l2', or 'max', specific norm used for regularization return_norm = False: whether to return the norm used ) norm is the norm from sklearn.preprocessing import normalize help(normalize) Help on function normalize in module sklearn.preprocessing._data: normalize(X, norm='l2', *, axis=1, copy=True, return_norm=False) Scale input vectors individually to unit norm (vector length). Read more in the :ref:`User Guide <preprocessing_normalization>`. Parameters ---------- X : {array-like, sparse matrix} of shape (n_samples, n_features) The data to normalize, element by element. scipy.sparse matrices should be in CSR format to avoid an un-necessary copy. norm : {'l1', 'l2', 'max'}, default='l2' The norm to use to normalize each non zero sample (or each non-zero feature if axis is 0). axis: {0, 1}, default=1 axis used to normalize the data along. If 1, independently normalize each sample, otherwise (if 0) normalize each feature. copy: bool, default=True set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix and if axis is 1). return_norm : bool, default=False whether to return the computed norms Returns ------- X : {ndarray, sparse matrix} of shape (n_samples, n_features) Normalized input X. norms : ndarray of shape (n_samples, ) if axis=1 else (n_features, ) An array of norms along given axis for X. When X is sparse, a NotImplementedError will be raised for norm 'l1' or 'l2'. See Also -------- Normalizer : Performs normalization using the Transformer API (e.g. as part of a preprocessing :class:`~sklearn.pipeline.Pipeline`). Notes ----- For a comparison of the different scalers, transformers, and normalizers, see :ref:`examples/preprocessing/plot_all_scaling.py <sphx_glr_auto_examples_preprocessing_plot_all_scaling.py>`. X1 = [[1,1,2],[2,2,4]] normalize(X1, norm = 'l2', #Select norm type return_norm=True #Return the norm of each vector ) array([[0.40824829, 0.40824829, 0.81649658], [0.40824829, 0.40824829, 0.81649658]]) normalize(X1, norm = 'l1', #Select norm type return_norm=True #Return the norm of each vector ) (array([[0.25, 0.25, 0.5], [0.25, 0.25, 0.5 ]]), array([4., 8.])) Standardization method that takes into account outliers Robust standardization The median and percentile (interquartile range is used by default) instead of the mean and standard deviation respectively for data standardization More suitable for data known to have outliers sklearn.preprocessing.robust_scale( X, axis = 0, with_centering = True, with_scaling = True quantile_range = (25.0, 75.0) : Percentile used to calculate the degree of dispersion copy=True ) class sklearn.preprocessing.RobustScaler( with_centering = True, with_scaling = True, quantile_raange = (25.0, 75.0), copy = True ) #robuststandardization from sklearn.preprocessing import robust_scale from sklearn.preprocessing import RobustScaler robust_scale(boston.data) array([[-0.06959315, 1.44, -0.57164988, ..., -1.33928571, 0.26190191, -0.63768116], [-0.06375455, 0. , -0.20294345, ..., -0.44642857, 0.26190191, -0.22188906], [-0.06376011, 0. , -0.20294345, ..., -0.44642857, 0.06667466, -0.73263368], ..., [-0.05445006, 0. , 0.17350891, ..., 0.69642857, 0.26190191, -0.57171414], [-0.04086745, 0. , 0.17350891, ..., 0.69642857, 0.09641444, -0.48775612], [-0.05816351, 0. , 0.17350891, ..., 0.69642857, 0.26190191, -0.34782609]]) rs = RobustScaler()#instantiation rs.fit_transform(boston.data) array([[-0.06959315, 1.44, -0.57164988, ..., -1.33928571, 0.26190191, -0.63768116], [-0.06375455, 0. , -0.20294345, ..., -0.44642857, 0.26190191, -0.22188906], [-0.06376011, 0. , -0.20294345, ..., -0.44642857, 0.06667466, -0.73263368], ..., [-0.05445006, 0. , 0.17350891, ..., 0.69642857, 0.26190191, -0.57171414], [-0.04086745, 0. , 0.17350891, ..., 0.69642857, 0.09641444, -0.48775612], [-0.05816351, 0. , 0.17350891, ..., 0.69642857, 0.26190191, -0.34782609]]) S-fold cross validation S-fold cross validation abbreviated as cv S is a hyperparameter. Divide the data into S folds. The model will be trained several times if divided into several parts. Fairer than simple cross-validation The extreme case is Leave one out cross validation (LOOCV, Leave one out cross validation) LOOCV retains one data point. It can also retain P data points as the verification set. This method is called LPOCV. Split into training set and test set sklearn.model_selection.train_test_split( *arrays: Data objects of equal length that need to be split. Multiple data can be split at the same time, but the data length must be consistent. test_size = 0.25: float, int, None, sample proportion used to verify the model, ranging from 0-1 When None, all samples will be used for training train_size = None: float, int, or None, sample ratio used to train the model, 0-1 When it is None, it is automatically calculated based on test_size. random_state = None random seed shuffle = True: whether to randomly arrange the samples before splitting stratify = None: array-like or None, whether to stratify the data according to the specified category label ) returns: list after splitting the input object, length = 2 * len (arrays) # Split into training set and test set from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size = 0.3) len(X_train) 354 len(boston.data) 506 len(y_train) 354 Cross-validation combines splitting and evaluation sklearn.model_selection cross_val_score combines splitting and evaluation estimator : the name of the estimator object used to fit the data X: array-like, data array used to fit the model cross_validate uses multiple evaluation indicators at the same time cross_val_predict uses the cross-validated model to predict from sklearn.model_selection import cross_val_score from sklearn.model_selection import cross_validate from sklearn.model_selection import cross_val_predict from sklearn.linear_model import LinearRegression reg = LinearRegression() scores = cross_val_score(reg, boston.data, boston.target, cv = 10) scores array([ 0.73376082, 0.4730725 , -1.00631454, 0.64113984, 0.54766046, 0.73640292, 0.37828386, -0.12922703, -0.76843243, 0.4189435 ]) scores.mean(), scores.std() (0.20252899006055367, 0.5952960169512383) The boston data set is arranged sequentially, which results in poor model scores and a large gap. #Randomly arrange the data set to ensure uniformity of splitting import numpy as np X, y = boston.data, boston.target indices = np.arange(y.shape[0]) np.random.shuffle(indices) X, y = X[indices], y[indices] reg = LinearRegression() scores = cross_val_score(reg, X, y, cv = 10) scores array([0.77212498, 0.79470905, 0.59899391, 0.80717087, 0.76007414, 0.75699564, 0.72688181, 0.24256808, 0.6518304, 0.66100191]) scores.mean(), scores.std() (0.6772350793373447, 0.1585378148669398) Create a decision tree using sklearn class sklearn.tree.DecisionTreeClassifier from sklearn.tree import DecisionTreeClassifier ct = DecisionTreeClassifier()#instantiation ct.fit(iris.data, iris.target)#Model training DecisionTreeClassifier() ct.max_features_ 4 ct.feature_importances_#Feature importance score array([0.01333333, 0. , 0.06405596, 0.92261071]) ct.predict(iris.data) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]) #The best metric output for classification models to compare the quality of different models from sklearn.metrics import classification_report print(classification_report(iris.target, ct.predict(iris.data))) precision recall f1-score support 0 1.00 1.00 1.00 50 1 1.00 1.00 1.00 50 2 1.00 1.00 1.00 50 accuracy 1.00 150 macro avg 1.00 1.00 1.00 150 weighted avg 1.00 1.00 1.00 150 # Presentation of classification results: confusion matrix from sklearn.metrics import confusion_matrix cm = confusion_matrix(iris.target, ct.predict(iris.data), labels = [2,1,0])#Customized category order output confusion matrix cm array([[50, 0, 0], [0, 50, 0], [0, 0, 50]], dtype=int64) #Displayed in the form of heat map %matplotlib inline import matplotlib.pyplot as plt import seaborn as sns sns.heatmap(cm, cmap = sns.color_palette("Blues"), annot = True) <AxesSubplot:> ?
2. Draw a binary tree using the iris database as an example
#Import iris data set from sklearn.datasets import load_iris iris = load_iris() import numpy as np X, y = iris.data, iris.target indices = np.arange(y.shape[0]) np.random.shuffle(indices) X, y = X[indices], y[indices] #Split iris data into training set and test set from sklearn.model_selection import train_test_split X_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size = 0.3) #Create decision tree from sklearn.tree import DecisionTreeClassifier rt = DecisionTreeClassifier()#instantiation rt.fit(iris.data, iris.target) DecisionTreeClassifier() from sklearn.model_selection import cross_val_score from sklearn.model_selection import cross_validate from sklearn.model_selection import cross_val_predict from sklearn.linear_model import LinearRegression import graphviz reg = LinearRegression() scores = cross_val_score(reg, X, y, cv = 10) scores array([0.86316316, 0.87764635, 0.90032253, 0.89369341, 0.94963924, 0.96141896, 0.93654241, 0.93546444, 0.88819228, 0.95601217]) scores.mean(), scores.std() (0.916209493818017, 0.03372720008929205) rt.max_features_ 4 rt.feature_importances_ array([0. , 0.01333333, 0.56405596, 0.42261071]) rt.predict(iris.data) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]) from sklearn.metrics import classification_report print(classification_report(iris.target, rt.predict(iris.data))) precision recall f1-score support 0 1.00 1.00 1.00 50 1 1.00 1.00 1.00 50 2 1.00 1.00 1.00 50 accuracy 1.00 150 macro avg 1.00 1.00 1.00 150 weighted avg 1.00 1.00 1.00 150 from sklearn.tree import export_graphviz dot_data = export_graphviz(rt, feature_names = iris.feature_names, class_names = iris.target_names) graph = graphviz.Source(dot_data) graph
The following is the binary tree drawn: