socio4health.Harmonizer#

Initialize the Harmonizer class for harmonizing and processing Dask DataFrames in health data integration.

min_common_columns#

Minimum number of common columns required for vertical merge (default is 1).

Type:: int

similarity_threshold#

Similarity threshold to consider for vertical merge (default is 0.8).

Type:: float

nan_threshold#

Percentage threshold of NaN values to drop columns (default is 1.0).

Type:: float

sample_frac#

Fraction of rows to sample for NaN detection (default is None).

Type:: float or None

column_mapping#

Column mapping configuration (default is None).

Type:: Enum, dict, str or Path

value_mappings#

Categorical value mapping configuration (default is None).

Type:: Enum, dict, str or Path

theme_info#

Theme/category information (default is None).

Type:: dict, str or Path

default_country#

Default country for mapping (default is None).

Type:: str

strict_mapping#

Whether to enforce strict mapping of columns and values (default is False).

Type:: bool

dict_df#

DataFrame with variable dictionary (default is None).

Type:: pandas.DataFrame

categories#

Categories for data selection (default is an empty list).

Type:: list of str

key_col#

Key column for data selection (default is None).

Type:: str

key_val#

Key values for data selection (default is an empty list).

Type:: list of str, int or float

extra_cols#

Extra columns for data selection (default is an empty list).

Type:: list of str

join_key#

Key column for joining DataFrames (default is None).

Type:: str

aux_key#

Auxiliary key column for joining DataFrames (default is None).

Type:: str

__init__(min_common_columns: int = 1, similarity_threshold: float = 1, nan_threshold: float = 1.0, sample_frac: float | None = None, column_mapping: Type[Enum] | Dict[str, Dict[str, str]] | str | Path | None = None, value_mappings: Type[Enum] | Dict[str, Dict[str, Dict[str, str]]] | str | Path | None = None, theme_info: Dict[str, List[str]] | str | Path | None = None, default_country: str | None = None, strict_mapping: bool = False, dict_df: DataFrame | None = None, categories: List[str] | None = None, key_col: str | None = None, key_val: List[str | int | float] | None = None, extra_cols: List[str] | None = None, join_key: str = None, aux_key: str | None = None)[source]#: Initialize the Harmonizer class with default parameters.

Methods

`__delattr__`(name, /)	Implement delattr(self, name).
`__dir__`()	Default dir() implementation.
`__eq__`(value, /)	Return self==value.
`__format__`(format_spec, /)	Default object formatter.
`__ge__`(value, /)	Return self>=value.
`__getattribute__`(name, /)	Return getattr(self, name).
`__getstate__`()	Helper for pickle.
`__gt__`(value, /)	Return self>value.
`__hash__`()	Return hash(self).
`__init__`([min_common_columns, ...])	Initialize the Harmonizer class with default parameters.
`__init_subclass__`	This method is called when a class is subclassed.
`__le__`(value, /)	Return self<=value.
`__lt__`(value, /)	Return self<value.
`__ne__`(value, /)	Return self!=value.
`__new__`(args, *kwargs)
`__reduce__`()	Helper for pickle.
`__reduce_ex__`(protocol, /)	Helper for pickle.
`__repr__`()	Return repr(self).
`__setattr__`(name, value, /)	Implement setattr(self, name, value).
`__sizeof__`()	Size of object in memory, in bytes.
`__str__`()	Return str(self).
`__subclasshook__`	Abstract classes can override this to customize issubclass().
`drop_nan_columns`(ddf_or_ddfs)	Drop columns where the majority of values are `NaN` using instance parameters.
`s4h_compare_with_dict`(ddfs)	Compare the columns available in the DataFrames with the variables in the dictionary and return a DataFrame with the columns that do not match in both directions.
`s4h_data_selector`(ddfs)
`s4h_get_available_columns`(df_or_dfs)	Get a list of unique column names from a single DataFrame or a list of DataFrames.
`s4h_harmonize_dataframes`(country_dfs)
`s4h_join_data`(ddfs)
`s4h_vertical_merge`(ddfs)

Attributes

`__annotations__`
`__dict__`
`__doc__`
`__module__`
`__weakref__`	list of weak references to the object

socio4health.Harmonizer#

This Page