Documentation for generators
Module of DataAnalysisToolkit¶
The generators
module in the DataAnalysisToolkit offers a collection of tools for generating synthetic data and comprehensive reports. These tools are especially useful for testing, data processing, and creating visual reports from datasets.
CSV Data Generator (csv_data_generator.py
)¶
Overview¶
The CSVDataGenerator
class is designed to create CSV files with randomly generated data. This functionality is particularly useful for generating test datasets for data processing and machine learning model validation.
Usage¶
generator = CSVDataGenerator("data/gen_test.csv", num_rows=500)
generator.generate_csv()
Features¶
Generates data with a mix of random integers, floats, dates, times, monetary values, and text.
Customizable number of rows.
Randomly includes null values to simulate real-world data scenarios.
Outputs generated data to a specified CSV file.
Example¶
Creating a CSV file with 100 rows of random data:
csv_generator = CSVDataGenerator("output.csv", num_rows=100)
csv_generator.generate_csv()
Generate Data (generate_data.py
)¶
Overview Generate Data¶
The generate_data
module provides functionality to create a pandas DataFrame with a variety of randomized data types, useful for testing and prototyping.
Features Generate Data¶
Generates random integers, floats, categorical data, and more.
Includes options for introducing missing values in the data.
Outputs the generated data as a pandas DataFrame.
Example Generate Data¶
Generating a DataFrame with random data and saving it to a CSV file:
df = generate_data(n=200)
df.to_csv("generated_data.csv", index=False)
Report Generator (report_generator.py
)¶
Overview Report Generator¶
The ReportGenerator
class allows for the generation of detailed HTML reports from pandas DataFrames. These reports include statistical summaries, visualizations, and custom text sections.
Usage Report Generator¶
data = pd.read_csv('your_data.csv')
report_gen = ReportGenerator(data)
report_gen.generate_html_report('data_report.html', custom_text='Your custom analysis here.')
Features Report Generator¶
Creates reports with statistical summaries like mean, median, mode, and standard deviation.
Generates histograms, scatter plots, and box plots for data visualization.
Allows inclusion of custom text or analysis.
Example Report Generator¶
Generating an HTML report from a DataFrame:
report_generator = ReportGenerator(df)
report_generator.generate_html_report("report.html", custom_text="Analysis Summary")
These tools in the generators
module provide powerful capabilities for creating synthetic data and insightful reports, aiding in data analysis, testing, and presentation tasks.