Documentation for integrators
Module of DataAnalysisToolkit¶
The integrators
module in DataAnalysisToolkit provides tools for integrating and combining data from various sources into a unified format. This is particularly useful for creating comprehensive datasets by merging data from different sources like SQL databases, Excel files, and APIs.
Data Integrator (data_integrator.py
)¶
Overview¶
The DataIntegrator
class allows for seamless integration of multiple pandas DataFrames. It supports various methods of integration, including concatenation, merging on key columns, joining on multiple columns, and time-series integration.
Usage¶
integrator = DataIntegrator()
integrator.add_data(df1)
integrator.add_data(df2)
combined_data = integrator.concatenate_data()
Methods¶
__init__(self)
: Initialize the Data Integrator.add_data(self, data_frame)
: Add a DataFrame to be integrated.concatenate_data(self)
: Concatenate all added DataFrames into a single DataFrame.merge_data(self, on, how="inner")
: Merge DataFrames based on a key column.join_on_multiple_columns(self, columns, how="inner")
: Join DataFrames on multiple columns.integrate_time_series(self, time_column, method="nearest")
: Integrate time-series data based on a time column.integrate_from_different_sources(self, source_data, integration_method="concat")
: Integrate data from different sources.
Examples¶
Concatenating DataFrames:
integrator = DataIntegrator()
integrator.add_data(df1)
integrator.add_data(df2)
concatenated_df = integrator.concatenate_data()
Merging DataFrames on a Common Key:
integrator = DataIntegrator()
integrator.add_data(df1)
integrator.add_data(df2)
merged_df = integrator.merge_data(on='common_key')
Time-Series Integration:
integrator = DataIntegrator()
integrator.add_data(time_series_df1)
integrator.add_data(time_series_df2)
time_series_combined = integrator.integrate_time_series('timestamp', method='nearest')
Integrating Data from Different Sources:
source_data = {'source1': df1, 'source2': df2}
integrator = DataIntegrator()
combined_source_data = integrator.integrate_from_different_sources(source_data)
The DataIntegrator
is a powerful tool for combining data in various ways, making it easier to prepare comprehensive datasets for analysis. This flexibility is crucial when dealing with data from multiple sources or formats.