pd_extras.check package

Submodules

pd_extras.check.sanitize module

Check sanity of dataframes

pd_extras.check.sanitize.check_if_column_exists(column: str, data: DataFrame)

Check the column exists in the dataframe. Usually used to cross-check against a dataframe.

Parameters:
  • column (str) – Name of the column

  • data (pd.DataFrame) – Dataframe object to check against.

Raises:

ValueError – If the column is not found in the dataframe.

>>> from pandas_utils.check.sanitize import check_if_column_exists
>>> check_if_column_exists(column="random_column", data=data)
pd_extras.check.sanitize.check_if_columns_exist(columns: list, data: DataFrame)

Check the column exists in the dataframe. Usually used to cross-check against a dataframe.

Parameters:
  • columns (list) – List of columns

  • data (pd.DataFrame) – Dataframe object to check against.

Raises:

ValueError – If any of the columns is not found in the dataframe.

>>> from pandas_utils.check.sanitize import check_if_columns_exist
>>> check_if_columns_exist(columns=["col_rand1", "col_ran2"], data=data)
pd_extras.check.sanitize.clean_column(column: str, is_lower: bool = True, default_char: str = '') str

Clean a column name.

Parameters:
  • column (str) – Name of column.

  • is_lower (bool, optional) – If cleaned column name should be in lowercase, defaults to True

  • default_char (str, optional) – What to replace illegal characters with, defaults to “”. Another great choice is “_”., defaults to “”

Returns:

Dataframe with clean column names.

Return type:

str

>>> from pandas_utils.check.sanitize import clean_column
>>> column_new = clean_column(
>>>     column=column,
>>>     is_lower=is_lower,
>>>     default_char=default_char,
>>> )
pd_extras.check.sanitize.clean_column_names(data: DataFrame, is_lower: bool = True, default_char: str = '') DataFrame

Clean columns of a dataframe.

Parameters:
  • data (pd.DataFrame) – Dataframe object.

  • is_lower (bool, optional) – If cleaned column name should be in lowercase, defaults to True

  • default_char (str, optional) – What to replace illegal characters with, defaults to “”. Another great choice is “_”.

Returns:

Dataframe with clean column names.

Return type:

pd.DataFrame

>>> from pandas_utils.check.sanitize import clean_column_names
>>> res: pd.DataFrame = clean_column_names(data=data)

Module contents