XGBoost of ML: Based on the Titanic data set (filling/label encoding/reasoning data reprocessing) using the XGBoost algorithm (model export and loading reasoning of json files) to realize the application case of binary classification prediction
Directory
Based on the Titanic data set (one-hot encoding/label encoding) using the XGBoost algorithm (model export and loading reasoning of json files) to realize the application case of binary classification prediction
# 1. Define the data set
# 2. Data preprocessing
# 2.1, missing value filling
# 2.2. Structural features
# 2.3, feature encoding
# 2.4. Separate features and labels
# 3. Model training and evaluation
# 3.1, the data set is divided into training set and test set
# 3.2. Model training and evaluation
# 3.3, the model is exported as a JSON file
# Get the parameters of the model
# 4. Model reasoning
# 4.1, load the model file
# 4.2. Create a model and load the model jason parameters
# 4.3, model reasoning
# 4.3.1. Load a new sample
# 4.3.2, preprocessing new sample data
# 4.3.3, Based on the json file, the model needs to be retrained, and then reasoned and predicted
Related articles
XGBoost of ML: Based on the Titanic data set (filling/label encoding/reasoning data reprocessing) using the XGBoost algorithm (model export and loading reasoning of json files) to realize the application case of binary classification prediction
XGBoost of ML: Based on the Titanic data set (filling/label encoding/reasoning data reprocessing) using the XGBoost algorithm (model export and loading reasoning of json files) to achieve binary classification prediction application case implementation code
Based on the Titanic data set (one-hot encoding/label encoding) using the XGBoost algorithm (json file model export and loading reasoning) to achieve binary classification prediction application case
# 1. Define the data set
RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 PassengerId 891 non-null int64 1 Survived 891 non-null int64 2 Pclass 891 non-null int64 3 Name 891 non-null object 4 Sex 891 non-null object 5 Age 714 non-null float64 6 SibSp 891 non-null int64 7 Parch 891 non-null int64 8 Ticket 891 non-null object 9 Fare 891 non-null float64 10 Cabin 204 non-null object 11 Embarked 889 non-null object dtypes: float64(2), int64(5), object(5) memory usage: 83.7 + KB none PassengerId Survived Pclass ... Fare Cabin Embarked 0 1 0 3 ... 7.2500 NaNS 1 2 1 1 ... 71.2833 C85 C 2 3 1 3 ... 7.9250 NaNS 3 4 1 1 ... 53.1000 C123 S 4 5 0 3 ... 8.0500 NaNS [5 rows x 12 columns]5
# 2. Data preprocessing
# 2.1, missing value filling strong>
# 2.2 Structural features
after fillna and FE <class 'pandas. core. frame. DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Survived 891 non-null int64 1 Pclass 891 non-null int64 2 Sex 891 non-null object 3 Age 891 non-null float64 4 SibSp 891 non-null int64 5 Parch 891 non-null int64 6 Fare 891 non-null float64 7 Embarked 891 non-null object 8 FamilySize 891 non-null int64 9 IsAlone 891 non-null int32 dtypes: float64(2), int32(1), int64(5), object(2) memory usage: 66.3 + KB none
# 2.3, feature encoding
after LabelEncoder <class 'pandas. core. frame. DataFrame'> RangeIndex: 891 entries, 0 to 890 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Survived 891 non-null int64 1 Pclass 891 non-null int64 2 Sex 891 non-null int32 3 Age 891 non-null float64 4 SibSp 891 non-null int64 5 Parch 891 non-null int64 6 Fare 891 non-null float64 7 Embarked 891 non-null int32 8 FamilySize 891 non-null int64 9 IsAlone 891 non-null int32 dtypes: float64(2), int32(3), int64(5) memory usage: 59.3 KB none
# 2.4. Separate features and labels
# 3. Model training and evaluation
# 3.1. Data set division For training set and test set
# 3.2, model training and evaluation
XGBoost Accuracy: 0.8435754189944135 F1: 0.7812500000000001 AUC: 0.8275978407557355 XGBoost 0.8435754189944135 0.7812500000000001 0.8275978407557355
ACC | F1 | AUC | |
XGBoost | 0.832402235 | 0.765625 | 0.815519568 |
XGBoost + FamilySize | 0.843575419 | 0.78125 | 0.827597841 |
XGBoost + FamilySize + IsAlone | 0.843575419 | 0.78125 | 0.827597841 |
# 3.3, the model is exported as a JSON file strong>
# Get the parameters of the model
model.json {'objective': 'binary:logistic', 'use_label_encoder': None, 'base_score': None, 'booster': None, 'callbacks': None, 'colsample_bylevel': None, 'colsample_bynode' : None, 'colsample_bytree': None, 'early_stopping_rounds': None, 'enable_categorical': False, 'eval_metric': None, 'feature_types': None, 'gamma': None, 'gpu_id': None, 'grow_policy': None , 'importance_type': None, 'interaction_constraints': None, 'learning_rate': None, 'max_bin': None, 'max_cat_threshold': None, 'max_cat_to_onehot': None, 'max_delta_step': None, 'max_depth': None, ' max_leaves': None, 'min_child_weight': None, 'missing': nan, 'monotone_constraints': None, 'n_estimators': 100, 'n_jobs': None, 'num_parallel_tree': None, 'predictor': None, 'random_state' : None, 'reg_alpha': None, 'reg_lambda': None, 'sampling_method': None, 'scale_pos_weight': None, 'subsample': None, 'tree_method': None, 'validate_parameters': None, 'verbosity': None }
# 4. Model reasoning
# 4.1, load model file
# 4.2. Create a model and load it Enter the model jason parameter
# 4.3, model reasoning
# 4.3.1, load a new sample
# 4.3.2, preprocessing new sample data
raw test data Pclass Sex Age SibSp Parch Fare Embarked FamilySize IsAlone 0 3 male 25 1 0 7.25 S 2 0 test data after LabelEncoder Pclass Sex Age SibSp Parch Fare Embarked FamilySize IsAlone 0 3 0 25 1 0 7.25 0 2 0
# 4.3.3. Model retraining is required based on json files, and then inference prediction is required
Model Reasoning Pclass Sex Age SibSp Parch Fare Embarked FamilySize IsAlone 0 3 0 25 1 0 7.25 0 2 0 Inference result: [0]
The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge algorithm skill treehomepage overview 47437 people are learning