Documentation for formatters
Module of DataAnalysisToolkit¶
The formatters
module in DataAnalysisToolkit offers tools for transforming and standardizing data in a DataFrame. It’s designed to prepare your data for analysis, ensuring consistency and quality.
Data Formatter (data_formatter.py
)¶
Overview¶
The DataFormatter
class is a versatile tool for performing various data formatting tasks on a pandas DataFrame. It can standardize date formats, normalize numeric data, categorize columns, and more.
Usage¶
formatter = DataFormatter(df)
formatter.standardize_dates('date_column')
formatter.categorize_columns(['category_column1', 'category_column2'])
formatter.normalize_numeric(['numeric_column1', 'numeric_column2'])
Methods¶
__init__(self, data)
: Initialize the formatter with a DataFrame.standardize_dates(self, date_column, date_format='%Y-%m-%d')
: Standardize the format of a date column.categorize_columns(self, columns)
: Convert specified columns to categorical data types for efficiency.normalize_numeric(self, numeric_columns)
: Normalize numeric columns by scaling to a mean of 0 and standard deviation of 1.fill_missing_values(self, column, fill_value=None, method=None)
: Fill missing values in a column either with a specified value or using a method like forward-fill or backward-fill.encode_categorical_variables(self, columns)
: Perform one-hot encoding on categorical variables to transform them into a format suitable for machine learning models.custom_transform(self, column, transform_func)
: Apply a custom transformation function to a specified column, allowing for flexible data transformations.
Examples¶
Here are some examples demonstrating how to use the DataFormatter
class:
Standardizing a Date Column:
formatter = DataFormatter(df)
formatter.standardize_dates('date_column')
Normalizing Numeric Data:
formatter.normalize_numeric(['age', 'income'])
Encoding Categorical Variables:
formatter.encode_categorical_variables(['gender', 'occupation'])
Custom Transformations:
formatter.custom_transform('price', lambda x: x * 1.2)
The formatters
module is essential for ensuring data consistency and quality, making it easier to perform reliable analysis. By providing a range of methods for data transformation, this module helps streamline the preprocessing stage of your data analysis projects.