socio4health.Harmonizer.s4h_vertical_merge#
- Harmonizer.s4h_vertical_merge(ddfs: List[DataFrame], overlap_threshold: float = 1, method: str = 'union') List[DataFrame][source]#
Merge a list of Dask DataFrames vertically using instance parameters.
- Parameters:
ddfs (list of dask.dataframe.DataFrame) –
List of Dask DataFrames to be merged.
overlap_threshold (float, optional) – Overlap coefficient (Szymkiewicz–Simpson coefficient) threshold to consider for vertical merge (default is 1).
method (str, optional) –
- Method to use for merging (default is “union”).
”union”: Merge all columns from all DataFrames, filling missing values with NaN.
”intersection”: Merge only columns that are common to all DataFrames.
- Returns:
List of merged Dask DataFrames, where each group contains DataFrames with sufficient column overlap and compatible data types.
- Return type:
list of dask.dataframe.DataFrame
Notes
DataFrames are grouped and merged if they share at least
min_common_columnscolumns and their column overlap coefficient is aboveoverlap_threshold.Only columns with matching data types are considered compatible for merging.