socio4health.Harmonizer.s4h_vertical_merge#

Harmonizer.s4h_vertical_merge(ddfs: List[DataFrame], overlap_threshold: float = 1, method: str = 'union') List[DataFrame][source]#

Merge a list of Dask DataFrames vertically using instance parameters.

Parameters:
  • ddfs (list of dask.dataframe.DataFrame) –

    List of Dask DataFrames to be merged.

  • overlap_threshold (float, optional) – Overlap coefficient (Szymkiewicz–Simpson coefficient) threshold to consider for vertical merge (default is 1).

  • method (str, optional) –

    Method to use for merging (default is “union”).
    • ”union”: Merge all columns from all DataFrames, filling missing values with NaN.

    • ”intersection”: Merge only columns that are common to all DataFrames.

Returns:

List of merged Dask DataFrames, where each group contains DataFrames with sufficient column overlap and compatible data types.

Return type:

list of dask.dataframe.DataFrame

Notes

  • DataFrames are grouped and merged if they share at least min_common_columns columns and their column overlap coefficient is above overlap_threshold.

  • Only columns with matching data types are considered compatible for merging.