pd_extras.extra package¶
Submodules¶
pd_extras.extra.flattener module¶
Flatten dataframes
- class pd_extras.extra.flattener.Flattener(num_rows_to_check: int, depth: int = 1, sep: str = '.')¶
Bases:
objectClass to flatten dataframes.
>>> from pandas_utils.extra.flattener import Flattener >>> flattener = Flattener(num_rows_to_check=num_rows_to_check, depth=depth) >>> # Flatten the dataframe >>> flat_data = flattener.flatten(data=data) >>> # Check whether a column has nested data or not >>> column_info = flattener.get_column_info(data=data)
- depth: int = 1¶
- flatten(data: dict | DataFrame) DataFrame¶
Return a normalized dataframe.
- Parameters:
data (
pd.DataFrame) – Pandas dataframe to normalize.- Returns:
Normalized dataframe.
- Return type:
pd.DataFrame
- get_column_info(data: DataFrame) list¶
Check whether a certain column is nested or not.
- Parameters:
data (
pd.DataFrame) – Dataframe to check.- Returns:
List of booleans. True if the corresponding column has nested data.
- Return type:
list
- num_rows_to_check: int¶
- sep: str = '.'¶
pd_extras.extra.operations module¶
Some extra operations
- pd_extras.extra.operations.auto_join(left: DataFrame, right: DataFrame, how: str = 'inner') DataFrame¶
Automatically join two dataframes based on common columns.
- Parameters:
left (
pd.DataFrame) – Left dataframe.right (
pd.DataFrame) – Right dataframe.how (
str, optional) – How to join the dataframes, defaults to “inner”.
- Raises:
ValueError – If no common column is found.
- Returns:
Dataframe with the join output.
- Return type:
pd.DataFrame
>>> from pandas_utils.extra.operations import auto_join >>> joined_df = auto_join(left=left, right=right)
- pd_extras.extra.operations.generate_random_dataframe(num_int_cols: int, num_float_cols: int, size: int, low_int: int = 1, high_int: int = 100, low_float: float = 0, high_float: float = 10) DataFrame¶
Generate a dataframe with random data.
- Parameters:
num_int_cols (
int) – Number of integer columns.num_float_cols (
int) – Number of float columns.size (
int) – Number of rows.low_int (
int, optional) – Lower bound for int columns, defaults to 1.high_int (
int, optional) – Upper bound for int columns, defaults to 100.low_float (
float, optional) – Lower bound for float columns, defaults to 0.high_float (
float, optional) – Upper bound for float columns, defaults to 10.
- Returns:
Dataframe with
num_int_colsint columns andnum_float_colsfloat columns.- Return type:
pd.DataFrame
>>> from pandas_utils.extra.operations import generate_random_dataframe >>> size = 100_000 >>> data = generate_random_dataframe(num_int_cols=2, num_float_cols=3, size=size)