Jump to content

MySQL DB Structure for Company Financials


Guest deepValue

Recommended Posts

Guest deepValue

I'm trying to build a database that stores custom calculations that I make for the stocks I research and want to update as the years go by. For example, I might want to store gross operating profit as a percentage of equity and improvements for a lodging REIT. I want to store the calculation for each of the last ten years and then add another column as each fiscal year goes by. I also want to store the same calculation for each of its competitors, along with, say, gross operating profit per available room.

 

I'm having trouble coming up with a structure that allows me to store multiple measures for multiple companies without having to create a separate table for each measure or for each company. Is there a more efficient way to do this than making dozens of tables?

Link to comment
Share on other sites

Consider storing it as key-values in one table - essentially try to get out of relational thinking as there seems to be very little additional value you can get out of many tables and having them relational. The alternative would be to go with a noSQL solution.

 

A few references:

http://www.developerforce.com/media/ForcedotcomBookLibrary/Force.com_Multitenancy_WP_101508.pdf

http://www.dbms2.com/2011/09/15/database-architecture-salesforce-com-force-com-and-database/

 

Hope this helps.

Link to comment
Share on other sites

Agree with mbharadwaj, use a key value system.

 

Just to flesh it out a bit more, and there are a million ways to do this, but one example:

 

columns: CompanyID, Date, MetricName, MetricValue

 

So you would have something like (with numbers totally made up):

 

IBM,2009-03-01,GrossOperProfit,5000

IBM,2009-03-01,GrossOperProfitPercentage,23.3

IBM,2009-06-01,GrossOperProfit,5500

IBM,2009-06-01,GrossOperProfitPercentage,24.1

 

You get the idea.. I would probably have another table with the company attributes.  Just make sure you standardize on the metricname values.

Link to comment
Share on other sites

There is actually quite a bit of thought and math that should go into relational database design. The properties of the tables that you create directly affect computation and storage resources for the database. An inefficient table design can lead to much slower querying time later on, as the database grows. This likely isn't a problem for small databases, but it's something to take into consideration.

 

When designing tables, you should aim to have the tables satisfy the properties of 3rd normal form, at least. See the Wikipedia page on database normalization for details: http://en.wikipedia.org/wiki/Database_normalization

 

When I took a course on database design, I remember the professor saying that one easy indicator that there is a problem with your table design is if a lot of information is repeated. If every row in your table contains the same company name/stock ticker pair, for example, it would be more space efficient to have a separate table mapping company names to stock tickers, and your original table could just reference a specific row in that new table, instead of storing Microsoft/MSFT, Microsoft/MSFT, Microsoft/MSFT over and over in every row of the original table. This would result in a more space-efficient method of storing information.

 

For a quick tutorial, see this page: http://www.ntu.edu.sg/home/ehchua/programming/sql/Relational_Database_Design.html

Link to comment
Share on other sites

I completely agree with mikazo, as I read the above answers I cringed.  Once the database grows how do you index those key-value tables, how do you do mass updates on them?

 

Really think through your structure, I'd create a table for each type of high level data structure, then possibly a calculation table for each.  If you have the original data calculations are easy and can be tweaked real time and the results happen for everything.

 

If you are building this for financial data I'd advise you look at building a data warehouse with a star schema.  They're optimized for reads not writes, but writing won't happen often.  Everything is pulled together with joins which are much more efficient.

 

Third normal form can make sense, but for what you're doing I'd go with a warehouse.  Information is repeated, but that's intentional, you're optimizing for reads.  Now if you need to do a lot of updates then normalizing makes sense.

 

Databases are powerful, extremely powerful, but only if they're setup and used correctly.  No one uses a screwdriver to hammer in a nail...

Link to comment
Share on other sites

As no_free_lunch mentioned, many ways to solve this.

 

key-value is the simplest. 3rd normal form is an overkill - it can result in too many joins as well. the use case indicated calculated values, there would be no need to update them. if there is a need to update (say a calculation was wrong), a new table can be created.

 

the only field that would need to be indexed would be the company name (from no_free_lunch's example) or instead a company_id that would participate in a join.

 

i am not sure as to how may listed companies are there for US markets. including international assuming 100K, and assuming 100 values need to be tracked per company, we are looking at 10 million rows to be indexed for read, which isn't a big deal at all. if it becomes a big deal, data can be easily sharded with simple strategies (table name "company_i" for all companies starting with "i") and so on.

 

many ways to do this.

 

the links from salesforce architecture provide good details about choices. many large systems (facebook, twitter, linkedin, google) have gone with schema-less designs where appropriate for a good reason. key-value is a schema-less design.

Link to comment
Share on other sites

Oddball,

 

The example I gave was the basis of a star-schema. Think about it, it's at the lowest level of granularity, the individual measures.  Just replace companyid, date, metricname with integer id fields and create dimensions for companyid, date, metricname.  Voila!  Build some indexes on companyid, possibly metricname / date too and you're good to go.

Link to comment
Share on other sites

Guest deepValue

Thanks guys. I've been making small databases here and there for years, but have never given any consideration to optimization because the scale of the projects just haven't required it. I just need to think it through and plan out what I really want to get out of this before deciding on a structure!

Link to comment
Share on other sites

Oddball,

 

The example I gave was the basis of a star-schema. Think about it, it's at the lowest level of granularity, the individual measures.  Just replace companyid, date, metricname with integer id fields and create dimensions for companyid, date, metricname.  Voila!  Build some indexes on companyid, possibly metricname / date too and you're good to go.

 

Ahh, my mistake, I misread your comment.  I read it as saying create a table with the following column headings instead of showing it as a template for table design.

 

After re-reading your comment I agree, that's the best way to design a database for this.  You might think optimization doesn't matter, but a poor structure is a complete pain to maintain, trust me.  Things throwing together have trouble growing, and eventually die because you'll need to spend too much time refactoring.

 

Build it correctly from the start.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...