pd_extras.extra package

Submodules

pd_extras.extra.flattener module

Flatten dataframes

class pd_extras.extra.flattener.Flattener(num_rows_to_check: int, depth: int = 1, sep: str = '.')

Bases: object

Class to flatten dataframes.

>>> from pandas_utils.extra.flattener import Flattener
>>> flattener = Flattener(num_rows_to_check=num_rows_to_check, depth=depth)
>>> # Flatten the dataframe
>>> flat_data = flattener.flatten(data=data)
>>> # Check whether a column has nested data or not
>>> column_info = flattener.get_column_info(data=data)
depth: int = 1
flatten(data: dict | DataFrame) DataFrame

Return a normalized dataframe.

Parameters:

data (pd.DataFrame) – Pandas dataframe to normalize.

Returns:

Normalized dataframe.

Return type:

pd.DataFrame

get_column_info(data: DataFrame) list

Check whether a certain column is nested or not.

Parameters:

data (pd.DataFrame) – Dataframe to check.

Returns:

List of booleans. True if the corresponding column has nested data.

Return type:

list

num_rows_to_check: int
sep: str = '.'

pd_extras.extra.operations module

Some extra operations

pd_extras.extra.operations.auto_join(left: DataFrame, right: DataFrame, how: str = 'inner') DataFrame

Automatically join two dataframes based on common columns.

Parameters:
  • left (pd.DataFrame) – Left dataframe.

  • right (pd.DataFrame) – Right dataframe.

  • how (str, optional) – How to join the dataframes, defaults to “inner”.

Raises:

ValueError – If no common column is found.

Returns:

Dataframe with the join output.

Return type:

pd.DataFrame

>>> from pandas_utils.extra.operations import auto_join
>>> joined_df = auto_join(left=left, right=right)
pd_extras.extra.operations.generate_random_dataframe(num_int_cols: int, num_float_cols: int, size: int, low_int: int = 1, high_int: int = 100, low_float: float = 0, high_float: float = 10) DataFrame

Generate a dataframe with random data.

Parameters:
  • num_int_cols (int) – Number of integer columns.

  • num_float_cols (int) – Number of float columns.

  • size (int) – Number of rows.

  • low_int (int, optional) – Lower bound for int columns, defaults to 1.

  • high_int (int, optional) – Upper bound for int columns, defaults to 100.

  • low_float (float, optional) – Lower bound for float columns, defaults to 0.

  • high_float (float, optional) – Upper bound for float columns, defaults to 10.

Returns:

Dataframe with num_int_cols int columns and num_float_cols float columns.

Return type:

pd.DataFrame

>>> from pandas_utils.extra.operations import generate_random_dataframe
>>> size = 100_000
>>> data = generate_random_dataframe(num_int_cols=2, num_float_cols=3, size=size)

Module contents