I’ve fallen in love with doing Data Analysis using Python and Pandas. Here are some useful ways to get started:
It’s easy to read data from CSV files, Excel files, HDF5, SQL and lots of other data sources. Use the read_xxx
functions for this.
import pandas as pd
import os
df = pd.read_csv(os.path.expanduser("~/data/mydata.csv"))
print(df.head(5)) # output the first 3 observations
Think of a Pandas DataFrame as being like an Excel sheet, with each column being able to have a data type accessable through the df.dtypes
method.
You can use the head()
method and tail()
method to glance at the first and last values of the dataset.
df.describe()
gives a quick statistical summary of the dataset.
You can grab a single column of the dataset by name df['Blah']
, or iterate through the rows using the df.iterrows()
method.
There is a Quick 10 Minute Introduction over at pydata.org.