Setup Code (Click to Expand)
import pyodbc
import sqlalchemy as sa
import pandas as pd
from sqlalchemy import create_engine
import pyodbc
import sqlalchemy as sa
import pandas as pd
from sqlalchemy import create_engine
One of the first tasks you will need to do with any analysis is loading in the data. While you may have data from a variety of sources, the most common will be in a SQL database.
Here we will cover how to access a SQL database and extract data in Python.
The Python packages you need to interact with SQL are:
I won’t import either package here because they are only needed for a handful of functions.
In order to run a SQL query in Python, you need to set Python up so that it can interpret the specific SQL dialect and find the database that it is running the query on. The SQLAlchemy function create_engine()
will give Python everything it needs to do this, but you also have to feed in the following parameters as a string (example given in code):
= sa.create_engine('mssql+pyodbc://{server-and-db-address}?driver=SQL+Server',echo = True) engine
Having specified a SQL engine, you can establish a connection.
= engine.connect() conn
You have two ways of going about running a SQL query in a Python script. You can either write your query out explicitly in your Python script, or you can read in an external SQL query. If the query is particularly lengthy, it is better to store is as a .sql file and call it from Python, to make it easier to read your code, and to maintain both components.
# open and read sql query
= open('path-to-query\query.sql', 'r')
query
# read query in to pandas dataframe
= pd.read_sql_query(query.read(),conn)
df
# close sql query
query.close()
You can check that this process has worked as expected by inspecting your pandas dataframe.
df.head()
Finally, you can export your dataframe using the pandas function to_csv()
. If you want to retain the index column that pandas adds to the dataframe, simply change index=False
to True
.
'path-to-save-df\df.csv', index=False) df.to_csv(