socio4health.utils.harmonizer_utils.standardize_dict#

socio4health.utils.harmonizer_utils.standardize_dict(raw_dict: DataFrame) DataFrame[source]#

Cleans and structures a dictionary-like DataFrame of variables by standardizing text fields, grouping possible answers, and removing duplicates.

Parameters:

raw_dict (pd.DataFrame) – DataFrame containing the required columns: question, variable_name, description, value, and optionally subquestion.

Returns:

A cleaned and grouped DataFrame by question and variable_name, with an additional column possible_answers containing concatenated descriptions.

Return type:

pd.DataFrame