Documentation for generators Module of DataAnalysisToolkit

The generators module in the DataAnalysisToolkit offers a collection of tools for generating synthetic data and comprehensive reports. These tools are especially useful for testing, data processing, and creating visual reports from datasets.

CSV Data Generator (csv_data_generator.py)

Overview

The CSVDataGenerator class is designed to create CSV files with randomly generated data. This functionality is particularly useful for generating test datasets for data processing and machine learning model validation.

Usage

generator = CSVDataGenerator("data/gen_test.csv", num_rows=500)
generator.generate_csv()

Features

  • Generates data with a mix of random integers, floats, dates, times, monetary values, and text.

  • Customizable number of rows.

  • Randomly includes null values to simulate real-world data scenarios.

  • Outputs generated data to a specified CSV file.

Example

Creating a CSV file with 100 rows of random data:

csv_generator = CSVDataGenerator("output.csv", num_rows=100)
csv_generator.generate_csv()

Generate Data (generate_data.py)

Overview Generate Data

The generate_data module provides functionality to create a pandas DataFrame with a variety of randomized data types, useful for testing and prototyping.

Features Generate Data

  • Generates random integers, floats, categorical data, and more.

  • Includes options for introducing missing values in the data.

  • Outputs the generated data as a pandas DataFrame.

Example Generate Data

Generating a DataFrame with random data and saving it to a CSV file:

df = generate_data(n=200)
df.to_csv("generated_data.csv", index=False)

Report Generator (report_generator.py)

Overview Report Generator

The ReportGenerator class allows for the generation of detailed HTML reports from pandas DataFrames. These reports include statistical summaries, visualizations, and custom text sections.

Usage Report Generator

data = pd.read_csv('your_data.csv')
report_gen = ReportGenerator(data)
report_gen.generate_html_report('data_report.html', custom_text='Your custom analysis here.')

Features Report Generator

  • Creates reports with statistical summaries like mean, median, mode, and standard deviation.

  • Generates histograms, scatter plots, and box plots for data visualization.

  • Allows inclusion of custom text or analysis.

Example Report Generator

Generating an HTML report from a DataFrame:

report_generator = ReportGenerator(df)
report_generator.generate_html_report("report.html", custom_text="Analysis Summary")

These tools in the generators module provide powerful capabilities for creating synthetic data and insightful reports, aiding in data analysis, testing, and presentation tasks.