pd_extras.write package

Submodules

pd_extras.write.common module

Common variables for dataframe to database module

pd_extras.write.driver module

class pd_extras.write.driver.NoSQLDatabaseType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: str, Enum

MONGO = 'mongo'
class pd_extras.write.driver.SQLDatabaseType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: str, Enum

MYSQL = 'mysql'
POSTGRES = 'postgresql'
SQLSERVER = 'sqlserver'

pd_extras.write.nosql_writer module

Write a pandas dataframe to a NoSQL database collection

class pd_extras.write.nosql_writer.NoSQLDatabaseWriter(dbtype: str, host: str, dbname: str, user: str, password: str, port: int, dns_seed_list: bool = False)

Bases: object

Writer class for NoSQL Database

close_connection()

Close the current connection.

delete_collection(collection_name: str)

Delete collection collection_name.

Parameters:

collection_name (str) – Name of the collection.

delete_database()

Drop the current database.

get_document_count(collection_name: str)

Get number of documents in collection collection_name.

Parameters:

collection_name (str) – Name of the collection.

Returns:

Document count.

Return type:

int

get_list_of_collections()

List names of collections in the current database.

Returns:

Collection names.

Return type:

list[str]

get_list_of_databases()

List names of databses in this connection.

Returns:

Database names.

Return type:

list[str]

get_or_create_collection(collection_name: str)

Get object for the collection collection_name.

Parameters:

collection_name (str) – Name of the collection.

Returns:

Collection object.

Return type:

pymongo.collection.Collection

write_data_to_collection(collection_name: str, data: DataFrame)

Write dataframe data to the collection collection_name.

Parameters:
  • collection_name (str) – Name of the collection.

  • data (pd.DataFrame) – Dataframe to write.

Returns:

Object with ids of inserted documents.

Return type:

pymongo.results.InsertManyResult

pd_extras.write.sql_writer module

Write a pandas dataframe to a SQL database table

class pd_extras.write.sql_writer.SQLDatabaseWriter(dbtype: SQLDatabaseType, host: str, dbname: str, user: str, password: str, port: int)

Bases: object

Database connection object for SQL databases Only database name dbname is stored. Rest of the credentials are used only to retrieved the connection. Two connections are created: one for the specific database dbname and another generic connection with no database selected. Be sure to call connobj.close_connection() after you are done.

close_connection()

Close the current connection to the database

delete_table(table_name: str)

Drop table table_name from the current database if it exists.

Parameters:

table (Table) – Table to delete.

get_column_info(table_name: str)

Get table schema from database.

Parameters:

table_name (str) – Name of the table in database.

Returns:

Pandas dataframe of table schema information.

Return type:

pd.DataFrame

has_table(table_name: str)

Check if the current database has table table_name.

Parameters:

table_name (str) – Name of the table to check.

Returns:

True if table_name exists in current database.

Return type:

bool

write_df_to_db(data: DataFrame, table_name: str, id_col: str = 'id', drop_first: bool = False, clean_columns: bool = True, max_length: int = 100)

Write data to Table table_name

Parameters:
  • data (pd.DataFrame) – Pandas dataframe containing data to write.

  • dbname (str) – Name of the database.

  • table_name (str) – Name of table in the database.

  • id_col (str, optional) – Id column of table if exists, defaults to “id”. Should be set to None if not present in data.

  • drop_first (bool) – If True, table table_name in database will be attempted to drop first.

  • clean_columns (bool) – If True, trailing/leading whitespaces and “ will be stripped off column names, defaults to “True”.

  • max_length (int) – Maximum length of VARCHAR type columns, defaults to 100.

Returns:

Cursor with result of query execution.

Return type:

sqlalchemy.engine.cursor.CursorResult

Module contents