It takes 21 seconds to insert 10,000 rows of data using the pd.DataFrame.to_sql method

to_sql is the method used in Pandas to write DataFrame data to the database. It can convert DataFrame into SQL statements, which facilitates us to store the data in the database for subsequent operations.

The to_sql method contains multiple parameters. The more commonly used parameters are name (table name), con (database connection object), if_exists (if the table already exists, what operation to perform), index (write the index column of the DataFrame into the database )wait.

pandas.read_sql(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, columns=None, chunksize=None, dtype_backend=_NoDefault.no_default, dtype=None)
Read SQL query or database table into a DataFrame.

pandas.read_sql – pandas 2.1.2 documentation

def to_sql(
    frame,
    name: str,
    con,
    schema: str | None = None,
    if_exists: Literal["fail", "replace", "append"] = "fail",
    index: bool = True,
    index_label: IndexLabel | None = None,
    chunksize: int | None = None,
    dtype: DtypeArg | None = None,
    method: Literal["multi"] | Callable | None = None,
    engine: str = "auto",
    **engine_kwargs,
) -> int | None:
    """
    Write records stored in a DataFrame to a SQL database.

    Parameters
    ----------
    frame: DataFrame, Series
    name : str
        Name of SQL table.
    con : SQLAlchemy connectable(engine/connection) or database string URI
        or sqlite3 DBAPI2 connection
        Using SQLAlchemy makes it possible to use any DB supported by that
        library.
        If a DBAPI2 object, only sqlite3 is supported.
    schema: str, optional
        Name of SQL schema in database to write to (if database flavor
        supports this). If None, use default schema (default).
    if_exists : {'fail', 'replace', 'append'}, default 'fail'
        - fail: If table exists, do nothing.
        - replace: If table exists, drop it, recreate it, and insert data.
        - append: If table exists, insert data. Create if does not exist.
    index : bool, default True
        Write DataFrame index as a column.
    index_label: str or sequence, optional
        Column label for index column(s). If None is given (default) and
        `index` is True, then the index names are used.
        A sequence should be given if the DataFrame uses MultiIndex.
    chunksize: int, optional
        Specify the number of rows in each batch to be written at a time.
        By default, all rows will be written at once.
    dtype: dict or scalar, optional
        Specifying the datatype for columns. If a dictionary is used, the
        keys should be the column names and the values should be the
        SQLAlchemy types or strings for the sqlite3 fallback mode. If a
        scalar is provided, it will be applied to all columns.
    method: {None, 'multi', callable}, optional
        Controls the SQL insertion clause used:

        - None : Uses standard SQL ``INSERT`` clause (one per row).
        - ``'multi'``: Pass multiple values in a single ``INSERT`` clause.
        - callable with signature ``(pd_table, conn, keys, data_iter) -> int | None``.

        Details and a sample callable implementation can be found in the
        section :ref:`insert method <io.sql.method>`.
    engine : {'auto', 'sqlalchemy'}, default 'auto'
        SQL engine library to use. If 'auto', then the option
        ``io.sql.engine`` is used. The default ``io.sql.engine``
        behavior is 'sqlalchemy'

        .. versionadded:: 1.3.0

    **engine_kwargs
        Any additional kwargs are passed to the engine.

    Returns
    -------
    None or int
        Number of rows affected by to_sql. None is returned if the callable
        passed into ``method`` does not return an integer number of rows.

        .. versionadded:: 1.4.0

    Notes
    -----
    The returned rows affected is the sum of the ``rowcount`` attribute of ``sqlite3.Cursor``
    or SQLAlchemy connectable. The returned value may not reflect the exact number of written
    rows as stipulated in the
    `sqlite3 <https://docs.python.org/3/library/sqlite3.html#sqlite3.Cursor.rowcount>`__ or
    `SQLAlchemy <https://docs.sqlalchemy.org/en/14/core/connections.html#sqlalchemy.engine.BaseCursorResult.rowcount>`__
    """ # noqa: E501
    if if_exists not in ("fail", "replace", "append"):
        raise ValueError(f"'{if_exists}' is not valid for if_exists")

    if isinstance(frame, Series):
        frame = frame.to_frame()
    elif not isinstance(frame, DataFrame):
        raise NotImplementedError(
            "'frame' argument should be either a Series or a DataFrame"
        )
  • Read the table and insert test:

import pandas as pd
import pyodbc
import openpyxl
from sqlalchemy import create_engine

#Connection parameters
server = 'localhost'
database = 'tsl'
username = 'sa'
password = 'lqxxx'

# Create a SQLAlchemy engine
engine = create_engine(f"mssql + pyodbc://{username}:{password}@{server}/{database}?driver=ODBC Driver 17 for SQL Server")

#Set file directory
filePath = r"C:\Users\Administrator\Documents\traindata20221231.xlsx"

#Read excel file "Details" tab data
table = pd.read_excel(filePath,sheet_name="Sheet0")

print(table.info())

#Connection test to verify connectivity
try:
    pd.read_sql('Employees', con=engine); print("connect successfully!")
except Exception as error:
    print("connect fail! because of :", error)

# import time
# T1 = time.time()
# #Use to_sql() method to insert data, if_exists parameter value: "replace" means that if the table exists, delete and rebuild the table, and re-create it; "append" means that if the table exists, data will be appended.
# try:
# table.to_sql("trading", con=engine, index=False, if_exists="replace");
# print("insert successfully!")
# except Exception as error:
# print("insert fail! because of:", error)
# print("data write complete!")
# T2 = time.time()
# print('Program running time: %s milliseconds' % ((T2 - T1)*1000))

# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 10233 entries, 0 to 10232
# Data columns (total 11 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 tradingHours 10233 non-null object
# 1 tradingChannel 10233 non-null object
# 2 currencyType 10233 non-null object
# 3 changeInto 10233 non-null float64
# 4 changeOut 10233 non-null float64
# 5 balance 10233 non-null float64
# 6 tradingName 10141 non-null object
# 7 tradingAccount 10153 non-null object
# 8 paymentMethod 10233 non-null object
# 9 postscript 8099 non-null object
# 10 summary 916 non-null object
# dtypes: float64(3), object(8)
# memory usage: 879.5 + KB
#None
# connect successfully!
# insert successfully!
# data write complete!
# Program running time: 20926.252126693726 milliseconds
# [Finished in 39.9s]

If the database table already exists and the if_exists parameter is not specified, the default behavior of the to_sql method is to append data, that is, the original data will not be overwritten when new data is written. At this time, you need to pay attention to the problem of data duplication.
When the to_sql method writes a large amount of data, it may cause insufficient memory, and the chunksize parameter needs to be used for batch writing.
When the to_sql method writes data, pandas.DataFrame.to_sql() is used by default, which may cause performance issues.