pd_extras.optimize package

Submodules

pd_extras.optimize.df_ops module

Optimize dataframe operations

pd_extras.optimize.df_ops.get_rows(data: DataFrame, columns: list) ndarray

Get iterable rows for the selected columns.

Parameters:
  • data (pd.DataFrame) – Pandas dataframe to select columns from.

  • columns (list) – List of columns.

Returns:

Numpy array containing iterable rows.

Return type:

np.ndarray

>>> from pandas_utils.optimize.df_ops import get_rows
>>> rows = get_rows(data=data, columns=list(columns))
pd_extras.optimize.df_ops.select_columns_from_dataframe(data: DataFrame, columns: list) DataFrame

Get a dataframe consisting columns from a dataframe This is the fastest approach from some tests conducted. df.iloc[indices, column_indices] is a close second.

Parameters:
  • data (pd.DataFrame) – Dataframe to take columns from.

  • columns (list) – List of columns.

Returns:

Dataframe with the selected columns.

Return type:

pd.DataFrame

>>> from pandas_utils.optimize.df_ops import select_columns_from_dataframe
>>> res = select_columns_from_dataframe(data=data, columns=list(columns))

Module contents