Here is a cheat sheet for fast Pandas methods over on Github. Each approach for data analysis is benchmarked. A useful reference.
I’ve fallen in love with doing Data Analysis using Python and Pandas. Here are some useful ways to get started:
It’s easy to read data from CSV files, Excel files, HDF5, SQL and lots of other data sources. Use the
read_xxx functions for this.
import pandas as pd import os df = pd.read_csv(os.path.expanduser("~/data/mydata.csv")) print(df.head(5)) # output the first 3 observations
Think of a Pandas DataFrame as being like an Excel sheet, with each column being able to have a data type accessable through the
You can use the
head() method and
tail() method to glance at the first and last values of the dataset.
df.describe() gives a quick statistical summary of the dataset.
You can grab a single column of the dataset by name
df['Blah'], or iterate through the rows using the
There is a Quick 10 Minute Introduction over at pydata.org.
I have fallen in love with running a Jupyter server on my notebook, and connecting to it using Emacs and the EIN package. It is great having a proper editor, set up for Python coding, to work on my Math models. I am starting to use it to create a Computable Document repository – and let’s face it – every document should be computable!