socio4health.Harmonizer.drop_nan_columns#

Harmonizer.drop_nan_columns(ddf_or_ddfs: DataFrame | List[DataFrame]) DataFrame | List[DataFrame][source]#

Drop columns where the majority of values are NaN using instance parameters.

Parameters:

ddf_or_ddfs (dask.dataframe.DataFrame or list of dask.dataframe.DataFrame) –

The Dask DataFrame or list of Dask DataFrames to process.

Returns:

The DataFrame(s) with columns dropped where the proportion of NaN values is greater than nan_threshold.

Return type:

dask.dataframe.DataFrame or list of dask.dataframe.DataFrame

Raises:

ValueError – If nan_threshold is not between 0 and 1, or if sample_frac is not None or a float between 0 and 1.