socio4health.Harmonizer.vertical_merge#

Harmonizer.vertical_merge(ddfs: List[DataFrame]) List[DataFrame][source]#

Merge a list of Dask DataFrames vertically using instance parameters.

Parameters:

ddfs (list of dask.dataframe.DataFrame) –

List of Dask DataFrames to be merged.

Returns:

List of merged Dask <https://docs.dask.org>`_ DataFrames, where each group contains DataFrames with sufficient column overlap and compatible data types.

Return type:

list of dask.dataframe.DataFrame

Important

  • DataFrames are grouped and merged if they share at least min_common_columns columns and their column similarity is above similarity_threshold.

  • Only columns with matching data types are considered compatible for merging.