Adding a million rows from CSV file to a Mysql database table using Python and Pandas!

Do you have a lot of CSV files, with sometimes many rows of data that you need to import to a Mysql database?

You could go the route of the various “LOAD INFILE” etc that various versions of Mysql supports, however I always feel that Python is the most obvious road to take.

Look at this code:

from sqlalchemy import create_engine
import pandas as pd
import pymysql

engine = create_engine('mysql+pymysql://mydbuser:mydbpass@mydbhost/mydatabase') # connect to server
conn = engine.connect()

# if csv don't have column names you msut give them as argument!
# df = pd.read_csv("mydata.csv", names=["some-header-1","some-header-1","some-header-2"])
df = pd.read_csv("mydata.csv")

print("Processing", end='')
chunksize = 10000
total_rows = 0
with pd.read_csv(filename, chunksize=chunksize) as reader:
    for chunk in reader:
	total_rows += len(chunk)
	print(".", end='')
	sys.stdout.flush()
	result = chunk.to_sql(con=conn, name=db_table, if_exists="append")
print("")
print("* Total rows processed: " + str(total_rows))
engine.dispose()
conn.close()

Pretty neat. huh?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.