XGBoost of ML: Based on the Titanic data set (filling/label encoding/reasoning data reprocessing) using the XGBoost algorithm (model export and loading reasoning of json files) to realize the application case of binary classification prediction

XGBoost of ML: Based on the Titanic data set (filling/label encoding/reasoning data reprocessing) using the XGBoost algorithm (model export and loading reasoning of json files) to realize the application case of binary classification prediction

Directory

Based on the Titanic data set (one-hot encoding/label encoding) using the XGBoost algorithm (model export and loading reasoning of json files) to realize the application case of binary classification prediction

# 1. Define the data set

# 2. Data preprocessing

# 2.1, missing value filling

# 2.2. Structural features

# 2.3, feature encoding

# 2.4. Separate features and labels

# 3. Model training and evaluation

# 3.1, the data set is divided into training set and test set

# 3.2. Model training and evaluation

# 3.3, the model is exported as a JSON file

# Get the parameters of the model

# 4. Model reasoning

# 4.1, load the model file

# 4.2. Create a model and load the model jason parameters

# 4.3, model reasoning

# 4.3.1. Load a new sample

# 4.3.2, preprocessing new sample data

# 4.3.3, Based on the json file, the model needs to be retrained, and then reasoned and predicted


Related articles
XGBoost of ML: Based on the Titanic data set (filling/label encoding/reasoning data reprocessing) using the XGBoost algorithm (model export and loading reasoning of json files) to realize the application case of binary classification prediction
XGBoost of ML: Based on the Titanic data set (filling/label encoding/reasoning data reprocessing) using the XGBoost algorithm (model export and loading reasoning of json files) to achieve binary classification prediction application case implementation code

Based on the Titanic data set (one-hot encoding/label encoding) using the XGBoost algorithm (json file model export and loading reasoning) to achieve binary classification prediction application case

# 1. Define the data set

RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 # Column Non-Null Count Dtype
--- ------ -------------- -----
 0 PassengerId 891 non-null int64
 1 Survived 891 non-null int64
 2 Pclass 891 non-null int64
 3 Name 891 non-null object
 4 Sex 891 non-null object
 5 Age 714 non-null float64
 6 SibSp 891 non-null int64
 7 Parch 891 non-null int64
 8 Ticket 891 non-null object
 9 Fare 891 non-null float64
 10 Cabin 204 non-null object
 11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7 + KB
none
   PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaNS
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaNS
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaNS

[5 rows x 12 columns]5

# 2. Data preprocessing

# 2.1, missing value filling

# 2.2 Structural features

after fillna and FE
<class 'pandas. core. frame. DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 10 columns):
 # Column Non-Null Count Dtype
--- ------ -------------- -----
 0 Survived 891 non-null int64
 1 Pclass 891 non-null int64
 2 Sex 891 non-null object
 3 Age 891 non-null float64
 4 SibSp 891 non-null int64
 5 Parch 891 non-null int64
 6 Fare 891 non-null float64
 7 Embarked 891 non-null object
 8 FamilySize 891 non-null int64
 9 IsAlone 891 non-null int32
dtypes: float64(2), int32(1), int64(5), object(2)
memory usage: 66.3 + KB
none

# 2.3, feature encoding

after LabelEncoder
<class 'pandas. core. frame. DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 10 columns):
 # Column Non-Null Count Dtype
--- ------ -------------- -----
 0 Survived 891 non-null int64
 1 Pclass 891 non-null int64
 2 Sex 891 non-null int32
 3 Age 891 non-null float64
 4 SibSp 891 non-null int64
 5 Parch 891 non-null int64
 6 Fare 891 non-null float64
 7 Embarked 891 non-null int32
 8 FamilySize 891 non-null int64
 9 IsAlone 891 non-null int32
dtypes: float64(2), int32(3), int64(5)
memory usage: 59.3 KB
none

# 2.4. Separate features and labels

# 3. Model training and evaluation

# 3.1. Data set division For training set and test set

# 3.2, model training and evaluation

XGBoost
Accuracy: 0.8435754189944135
F1: 0.7812500000000001
AUC: 0.8275978407557355
XGBoost 0.8435754189944135 0.7812500000000001 0.8275978407557355
ACC F1 AUC
XGBoost 0.832402235 0.765625 0.815519568
XGBoost + FamilySize 0.843575419 0.78125 0.827597841
XGBoost + FamilySize + IsAlone 0.843575419 0.78125 0.827597841

# 3.3, the model is exported as a JSON file

# Get the parameters of the model

model.json {'objective': 'binary:logistic', 'use_label_encoder': None, 'base_score': None, 'booster': None, 'callbacks': None, 'colsample_bylevel': None, 'colsample_bynode' : None, 'colsample_bytree': None, 'early_stopping_rounds': None, 'enable_categorical': False, 'eval_metric': None, 'feature_types': None, 'gamma': None, 'gpu_id': None, 'grow_policy': None , 'importance_type': None, 'interaction_constraints': None, 'learning_rate': None, 'max_bin': None, 'max_cat_threshold': None, 'max_cat_to_onehot': None, 'max_delta_step': None, 'max_depth': None, ' max_leaves': None, 'min_child_weight': None, 'missing': nan, 'monotone_constraints': None, 'n_estimators': 100, 'n_jobs': None, 'num_parallel_tree': None, 'predictor': None, 'random_state' : None, 'reg_alpha': None, 'reg_lambda': None, 'sampling_method': None, 'scale_pos_weight': None, 'subsample': None, 'tree_method': None, 'validate_parameters': None, 'verbosity': None }

# 4. Model reasoning

# 4.1, load model file

# 4.2. Create a model and load it Enter the model jason parameter

# 4.3, model reasoning

# 4.3.1, load a new sample

# 4.3.2, preprocessing new sample data

raw test data
   Pclass Sex Age SibSp Parch Fare Embarked FamilySize IsAlone
0 3 male 25 1 0 7.25 S 2 0
test data after LabelEncoder
   Pclass Sex Age SibSp Parch Fare Embarked FamilySize IsAlone
0 3 0 25 1 0 7.25 0 2 0

# 4.3.3. Model retraining is required based on json files, and then inference prediction is required

Model Reasoning
    Pclass Sex Age SibSp Parch Fare Embarked FamilySize IsAlone
0 3 0 25 1 0 7.25 0 2 0
Inference result: [0]

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge algorithm skill treehomepage overview 47437 people are learning