Uber has open sourced it’s Pyro probabilistic programming language . “Pyro is a tool for deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling “. I’m excited about exploring this new language!
Logging in Large Math Models
At Man AHL, they are using an interesting approach to logging. They are storing inputs and outputs from their maths functions, serialised in HDF5 files. The HDF5 files are stored in a shared filesystem so they are available to all developers.
In building algo trading models, we had to log all the decisions made by the algorithm. This meant logging all the Order Books the decision was made on, as well as other inputs to the decision making process. We built custom Java code to handle this process.
Benchmarked Pandas Cheat Sheet
Here is a cheat sheet for fast Pandas methods over on Github. Each approach for data analysis is benchmarked. A useful reference.
Stephan’s Unusual Hobby
Stephan Boyer is an engineer at AirBnb, and when he gets home from work he likes to prove theorems in the Coq Proof Assistant. It seems like a great way to become intimately familiar with a mathematical proof!
What an interesting hobby!
Pandas Quick-Start
I’ve fallen in love with doing Data Analysis using Python and Pandas. Here are some useful ways to get started:
It’s easy to read data from CSV files, Excel files, HDF5, SQL and lots of other data sources. Use the read_xxx
functions for this.
import pandas as pd
import os
df = pd.read_csv(os.path.expanduser("~/data/mydata.csv"))
print(df.head(5)) # output the first 3 observations
Think of a Pandas DataFrame as being like an Excel sheet, with each column being able to have a data type accessable through the df.dtypes
method.
You can use the head()
method and tail()
method to glance at the first and last values of the dataset.
df.describe()
gives a quick statistical summary of the dataset.
You can grab a single column of the dataset by name df['Blah']
, or iterate through the rows using the df.iterrows()
method.
There is a Quick 10 Minute Introduction over at pydata.org.
Storing Financial Time Series Data
Before deciding on a storage solution for Financial Time Series Data, it’s worthwhile having a think about how you are going to use the data.
Mathpix
I have started to use the Mathpix app on my Macbook Pro to convert maths from PDFs to Latex. It works really well! I am super excited about this! I have wanted to build an app that does this for a while, but never got around to it.
Jupyter and EIN
I have fallen in love with running a Jupyter server on my notebook, and connecting to it using Emacs and the EIN package. It is great having a proper editor, set up for Python coding, to work on my Math models. I am starting to use it to create a Computable Document repository – and let’s face it – every document should be computable!
Quant forums
It’s hard to find good quality forums for Quantitative subjects. The best forum I’ve found is probably Willmott’s forum. I’ve heard that there is a fairly active group in The Thalesian’s Slack channel. The FT Alphaville Blog occasionally has some interesting stuff. There are Reddit groups – /r/quant and /r/quantfinance – that are OK at best.
Linear Regression by Hand
There is a post over at the Data Science Gazette on Linear Regression by Hand. It is a fairly simple look at linear regression and Ordinary Least Squares, and demoing the computation in R. It doesn’t go much into the nitty gritty, but it does show how the linear algebra relates to the statistical regression output.