Model Reproduction

Reproducte Steps

Step 1: Download model

You can click the “Download model” button of the corresponding task in the modeling task column to download the model that has been trained by the task. At this time, a compressed file will be obtained, and the general content after decompression is as follows:

Pasted Image

Step 2: Install prediction environment

This assumes that the user has already setup the Anaconda environment, which is a tool for quickly installing the Python environment, and then runs the following command to set up the prediction environment:

# Create a Python virtual environment
conda create -n changtian python==3.10 -y
# Activate virtual environment
conda activate changtian
# Install predictive frame dependencies

info

At present, the platform has released the prediction framework library changtianml in the Python public warehouse PyPI, through which the dependency library can be more portable and convenient for users to complete the prediction and other related work.

The model can be run by referring to the model_reproduction_example.py sample file provided by the platform. The comments in the file explain the functions and basic usage of the prediction framework library in detail. Please read it in detail. Here is a piece of code for reference only:

info

pip install changtianml==0.2.11 -i https://pypi.tuna.tsinghua.edu.cn/simple/

Well done! The prediction environment is completed!

Step 3: Download model training set and test set

Select the Advanced button to the right of the task, then click the Reproduction button to download training set and test set and click the Down model to download trained model in the drop-down box.

Pasted Image

Step 4: Model prediction

```

# Import platform-related dependencies

from changtianml import tabular_incrml

import pandas as pd

# Model validation function

def eval_score(task_type, y_val, y_pred, y_proba=None):

"""

Args:

task_type (str, optional): task type, classification or regression

y_val (pd.Series): true label of the validation set

y_pred (pd.Series): label predicted by the model

y_proba (pd.Series, optional): probability predicted by the model, only required for classification tasks, default is None.

Returns:

dict: calculated values of common indicators

"""

if task_type == 'classification':

from sklearn.metrics import log_loss, classification_report

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, log_loss

val_loss = log_loss(y_val, y_proba)

val_f1 = f1_score(y_val, y_pred, average='weighted')

val_accuracy = accuracy_score(y_val, y_pred)

val_precision = precision_score(y_val, y_pred, average='weighted')

val_recall = recall_score(y_val, y_pred, average='weighted')

if y_proba.shape[1] > 1:

num_classes = y_proba.shape[1]

val_auc_roc = 0.0

for class_idx in range(num_classes):

val_auc_roc += roc_auc_score((y_val == class_idx).astype(int), y_proba[:, class_idx])

val_auc_roc /= num_classes

else:

val_auc_roc = roc_auc_score(y_val, y_proba)

print(classification_report(y_val, y_pred))

return {

"val_log_loss": val_loss,

"val_f1_weighted": val_f1,

"val_accuracy": val_accuracy,

"val_precision_weighted": val_precision,

"val_recall_weighted": val_recall,

"val_auc": val_auc_roc,

}

else:

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

import numpy as np

val_mse = mean_squared_error(y_val, y_pred)

val_rmse = np.sqrt(val_mse)

val_mae = mean_absolute_error(y_val, y_pred)

val_r_squared = r2_score(y_val, y_pred)

if np.isnan(val_r_squared):

val_r_squared = -1

alpha = 0.5

return {

'val_MSE': val_mse,

'val_RMSE': val_rmse,

'val_MAE': val_mae,

'val_R-squared': val_r_squared,

}

# Hyperparameter settings

target_name = "" ### Please change to the target column name of your task

res_path = "result"

task_type = 'classification' ### If it is a classification task, please set it to classification, if it is a regression task, please set it to regression

test_path = "validation.csv"

# Specify the model directory to load the model

ti = tabular_incrml(res_path)

# View model parameters

print("model params:", ti.model.best_config, "\n")

# Specify the dataset to be predicted and start prediction

y_val = pd.read_csv(test_path)[target_name]

# Model prediction results

pred = ti.predict(test_path)

proba = ti.predict_proba(test_path) if task_type == 'classification' else None

# Model validation

model_res = eval_score(task_type, y_val, pred, proba)

# View output

for k, v in model_res.items():

print(f"{k}: {v}")

```