MySQL DB Structure for Company Financials

October 14, 2013

I'm trying to build a database that stores custom calculations that I make for the stocks I research and want to update as the years go by. For example, I might want to store gross operating profit as a percentage of equity and improvements for a lodging REIT. I want to store the calculation for each of the last ten years and then add another column as each fiscal year goes by. I also want to store the same calculation for each of its competitors, along with, say, gross operating profit per available room.

I'm having trouble coming up with a structure that allows me to store multiple measures for multiple companies without having to create a separate table for each measure or for each company. Is there a more efficient way to do this than making dozens of tables?

mbharadwaj · October 14, 2013

Consider storing it as key-values in one table - essentially try to get out of relational thinking as there seems to be very little additional value you can get out of many tables and having them relational. The alternative would be to go with a noSQL solution.

A few references:

http://www.developerforce.com/media/ForcedotcomBookLibrary/Force.com_Multitenancy_WP_101508.pdf

http://www.dbms2.com/2011/09/15/database-architecture-salesforce-com-force-com-and-database/

Hope this helps.

no_free_lunch · October 14, 2013

Agree with mbharadwaj, use a key value system.

Just to flesh it out a bit more, and there are a million ways to do this, but one example:

columns: CompanyID, Date, MetricName, MetricValue

So you would have something like (with numbers totally made up):

IBM,2009-03-01,GrossOperProfit,5000

IBM,2009-03-01,GrossOperProfitPercentage,23.3

IBM,2009-06-01,GrossOperProfit,5500

IBM,2009-06-01,GrossOperProfitPercentage,24.1

You get the idea.. I would probably have another table with the company attributes. Just make sure you standardize on the metricname values.

October 14, 2013

Ok that makes sense. Thanks to you both!

mikazo · October 14, 2013

There is actually quite a bit of thought and math that should go into relational database design. The properties of the tables that you create directly affect computation and storage resources for the database. An inefficient table design can lead to much slower querying time later on, as the database grows. This likely isn't a problem for small databases, but it's something to take into consideration.

When designing tables, you should aim to have the tables satisfy the properties of 3rd normal form, at least. See the Wikipedia page on database normalization for details: http://en.wikipedia.org/wiki/Database_normalization

When I took a course on database design, I remember the professor saying that one easy indicator that there is a problem with your table design is if a lot of information is repeated. If every row in your table contains the same company name/stock ticker pair, for example, it would be more space efficient to have a separate table mapping company names to stock tickers, and your original table could just reference a specific row in that new table, instead of storing Microsoft/MSFT, Microsoft/MSFT, Microsoft/MSFT over and over in every row of the original table. This would result in a more space-efficient method of storing information.

For a quick tutorial, see this page: http://www.ntu.edu.sg/home/ehchua/programming/sql/Relational_Database_Design.html

oddballstocks · October 14, 2013

I completely agree with mikazo, as I read the above answers I cringed. Once the database grows how do you index those key-value tables, how do you do mass updates on them?

Really think through your structure, I'd create a table for each type of high level data structure, then possibly a calculation table for each. If you have the original data calculations are easy and can be tweaked real time and the results happen for everything.

If you are building this for financial data I'd advise you look at building a data warehouse with a star schema. They're optimized for reads not writes, but writing won't happen often. Everything is pulled together with joins which are much more efficient.

Third normal form can make sense, but for what you're doing I'd go with a warehouse. Information is repeated, but that's intentional, you're optimizing for reads. Now if you need to do a lot of updates then normalizing makes sense.

Databases are powerful, extremely powerful, but only if they're setup and used correctly. No one uses a screwdriver to hammer in a nail...

mbharadwaj · October 14, 2013

As no_free_lunch mentioned, many ways to solve this.

key-value is the simplest. 3rd normal form is an overkill - it can result in too many joins as well. the use case indicated calculated values, there would be no need to update them. if there is a need to update (say a calculation was wrong), a new table can be created.

the only field that would need to be indexed would be the company name (from no_free_lunch's example) or instead a company_id that would participate in a join.

i am not sure as to how may listed companies are there for US markets. including international assuming 100K, and assuming 100 values need to be tracked per company, we are looking at 10 million rows to be indexed for read, which isn't a big deal at all. if it becomes a big deal, data can be easily sharded with simple strategies (table name "company_i" for all companies starting with "i") and so on.

many ways to do this.

the links from salesforce architecture provide good details about choices. many large systems (facebook, twitter, linkedin, google) have gone with schema-less designs where appropriate for a good reason. key-value is a schema-less design.

no_free_lunch · October 14, 2013

Oddball,

The example I gave was the basis of a star-schema. Think about it, it's at the lowest level of granularity, the individual measures. Just replace companyid, date, metricname with integer id fields and create dimensions for companyid, date, metricname. Voila! Build some indexes on companyid, possibly metricname / date too and you're good to go.

October 15, 2013

Thanks guys. I've been making small databases here and there for years, but have never given any consideration to optimization because the scale of the projects just haven't required it. I just need to think it through and plan out what I really want to get out of this before deciding on a structure!

oddballstocks · October 15, 2013

Oddball,

The example I gave was the basis of a star-schema. Think about it, it's at the lowest level of granularity, the individual measures. Just replace companyid, date, metricname with integer id fields and create dimensions for companyid, date, metricname. Voila! Build some indexes on companyid, possibly metricname / date too and you're good to go.

Ahh, my mistake, I misread your comment. I read it as saying create a table with the following column headings instead of showing it as a template for table design.

After re-reading your comment I agree, that's the best way to design a database for this. You might think optimization doesn't matter, but a poor structure is a complete pain to maintain, trust me. Things throwing together have trouble growing, and eventually die because you'll need to spend too much time refactoring.

Build it correctly from the start.

Sign In

MySQL DB Structure for Company Financials

Recommended Posts

Guest deepValue

mbharadwaj

no_free_lunch

Guest deepValue

mikazo

oddballstocks

mbharadwaj

no_free_lunch

Guest deepValue

oddballstocks

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity