Documentation for model Directory of DataAnalysisToolkit

The model directory in the DataAnalysisToolkit contains tools for feature engineering and model evaluation, essential for the preparation and assessment of machine learning models.

Feature Engineer (feature_engineer.py)

Overview

The FeatureEngineer class is designed to perform various feature engineering tasks on datasets. Feature engineering enhances the dataset by creating new features from existing ones, improving model performance and providing deeper insights.

Usage

engineer = FeatureEngineer(df)
engineer.binning('age', bins=[0, 18, 65, 100], labels=['child', 'adult', 'senior'])
engineer.create_interaction('height', 'weight')

Methods

  • __init__(self, data): Initialize with a dataset for feature engineering.

  • binning(self, column, bins, labels): Perform binning on a specified column, transforming continuous data into categorical bins.

  • create_interaction(self, column1, column2): Create an interaction feature by multiplying two columns.

Example

Binning a Numeric Column and Creating an Interaction Feature:

feature_engineer = FeatureEngineer(df)
feature_engineer.binning('income', bins=[0, 50000, 100000, 150000], labels=['low', 'medium', 'high'])
feature_engineer.create_interaction('age', 'income')

Model Evaluator (model_evaluator.py)

Overview Model Evaluator

The ModelEvaluator class leverages scikit-learn’s metrics to compute key evaluation metrics for machine learning models. This includes metrics like confusion matrix, precision, and recall, essential for understanding a model’s performance.

Usage Model Evaluator

evaluator = ModelEvaluator(model, X_test, y_test)
conf_matrix = evaluator.get_confusion_matrix()
precision = evaluator.get_precision()
recall = evaluator.get_recall()

Methods Model Evaluator

  • __init__(self, model, X_test, y_test): Initialize the evaluator with a trained model and test data.

  • get_confusion_matrix(self): Compute the confusion matrix for the model’s predictions.

  • get_precision(self): Compute the precision score for the model’s predictions.

  • get_recall(self): Compute the recall score for the model’s predictions.

Example Model Evaluator

Evaluating a Classification Model:

model_evaluator = ModelEvaluator(trained_model, X_test, y_test)
print("Confusion Matrix:", model_evaluator.get_confusion_matrix())
print("Precision:", model_evaluator.get_precision())
print("Recall:", model_evaluator.get_recall())

These tools in the model directory facilitate the preparation and evaluation of machine learning models, providing functionality to enhance datasets through feature engineering and to assess model performance with key metrics.