Storing Financial Time Series Data

Before deciding on a storage solution for Financial Time Series Data, it’s worthwhile having a think about how you are going to use the data.

Are your primary use cases algo back-testing? Do you need to support concurrent access? Do you want relational integrity? Do you have unstructured data you need to store (e.g. based on financial contract differences)?

The simplest method is to store flat files, and then potentially create separate indexes into them. The downsides are if you have many processes writing or reading from the files simultaneously.

You could use a column-oriented time series database. This is usually not so great for exploratory data analysis, where you are fully instantiating your records early. This traverses through the indices to instantiate each row resulting in a loss of performance. Also, solutions like KDB have a rather arcane language that need specialist programmer to exploit.

No-SQL databases could be used. Schema changes are a pain though. Relationships between data can be difficult to maintain.

Relational databases such as PostgreSQL can be used. This is my preferred solution depending on the use case. If you have a ton of FX streaming data coming in and you need to store tick data, PostgreSQL may be too slow. You can potentially write a coalescing process that stores tick data at 1 or 5 second intervals, along with statistics about the missed ticks.

The best solution definitely all depends on your particular use case.