pd_extras.check package¶
Submodules¶
pd_extras.check.sanitize module¶
Check sanity of dataframes
- pd_extras.check.sanitize.check_if_column_exists(column: str, data: DataFrame)¶
Check the column exists in the dataframe. Usually used to cross-check against a dataframe.
- Parameters:
column (
str) – Name of the columndata (
pd.DataFrame) – Dataframe object to check against.
- Raises:
ValueError – If the column is not found in the dataframe.
>>> from pandas_utils.check.sanitize import check_if_column_exists >>> check_if_column_exists(column="random_column", data=data)
- pd_extras.check.sanitize.check_if_columns_exist(columns: list, data: DataFrame)¶
Check the column exists in the dataframe. Usually used to cross-check against a dataframe.
- Parameters:
columns (
list) – List of columnsdata (
pd.DataFrame) – Dataframe object to check against.
- Raises:
ValueError – If any of the columns is not found in the dataframe.
>>> from pandas_utils.check.sanitize import check_if_columns_exist >>> check_if_columns_exist(columns=["col_rand1", "col_ran2"], data=data)
- pd_extras.check.sanitize.clean_column(column: str, is_lower: bool = True, default_char: str = '') str¶
Clean a column name.
- Parameters:
column (
str) – Name of column.is_lower (
bool, optional) – If cleaned column name should be in lowercase, defaults to Truedefault_char (
str, optional) – What to replace illegal characters with, defaults to “”. Another great choice is “_”., defaults to “”
- Returns:
Dataframe with clean column names.
- Return type:
str
>>> from pandas_utils.check.sanitize import clean_column >>> column_new = clean_column( >>> column=column, >>> is_lower=is_lower, >>> default_char=default_char, >>> )
- pd_extras.check.sanitize.clean_column_names(data: DataFrame, is_lower: bool = True, default_char: str = '') DataFrame¶
Clean columns of a dataframe.
- Parameters:
data (
pd.DataFrame) – Dataframe object.is_lower (
bool, optional) – If cleaned column name should be in lowercase, defaults to Truedefault_char (
str, optional) – What to replace illegal characters with, defaults to “”. Another great choice is “_”.
- Returns:
Dataframe with clean column names.
- Return type:
pd.DataFrame
>>> from pandas_utils.check.sanitize import clean_column_names >>> res: pd.DataFrame = clean_column_names(data=data)