Sunday, October 1, 2017

Taking advantage of tax advantaged accounts

Different investment accounts have various tax implications, which affect your returns in deceptively dramatic ways. We'll take a look at two types of assets:
  • Growth - e.g. stocks
  • Fixed-income - e.g. bonds and CDs
and how they perform in the following investment accounts
  • "Regular" taxable accounts
  • HSA
  • Traditional and Roth 401k
  • Roth IRA
  • Traditional, deductible IRA
  • Traditional, non-deductible IRA
First, let's summarize the tax differences between the two types of assets and the different investment accounts (recommended read). Then using this information, we'll derive formulas for calculating the post-tax return for various combinations of assets and account types (if you want to check my work). Finally, we'll provide some concluding remarks on the results.

Growth assets, such as non-dividend paying stocks, derive their value primarily from growth in price. For example, Tesla stock grew from $200/share to $350/share in the past year, for a gain of $150/share. We call these capital gains, and they are taxed at a special rate called the capital gains tax rate, which is typically much lower than your income tax.

Fixed-income assets, such as bonds, CDs, and dividend-paying stocks, derive their value primarily from regular payments. This means that when you buy and hold the asset, the issuing entity will pay you some amount of money at set points in time. This payment goes under various names, such as dividend, coupon, interest, etc. But it is normally counted as income and taxed at your current income tax rate.

Now let's go into some of the tax implications of the (non-exhaustive) list of investment accounts that you can buy these assets in.

What I'm going to call a "regular" taxable account is just a typical account that you would open with a brokerage and fund using after-tax money (e.g. from your bank account). Fixed-income is taxed the year it is received, and capital gains are taxed the year the asset is sold.

A health savings account (HSA) is a tax-advantaged account. You contribute with pre-tax money, pay no taxes on growth, and no taxes upon withdrawal when used for qualifying medical expenses.

A traditional 401k account is a retirement benefit optionally provided by your employer. It is funded by pre-tax money that is deducted from your paycheck. As part of the benefit, your employer may choose to additionally contribute to your 401k account, typically in the form of a percentage match (e.g. for every dollar you put into your 401k, your employer will put in 50 cents). Taxes are simple: you don't pay income tax on the money you put in and you pay income taxes when you withdraw. One non-obvious consideration is that your income tax rate in retirement will likely be different than your current income tax rate.

A Roth 401k is nearly identical to the traditional 401k, except you fund it using after-tax money with no taxation at withdrawal. This means you pay taxes now at your current income tax rate instead of later at your retirement tax rate. There's an additional twist to the Roth 401k, and that is employer matches are pre-tax and placed into a traditional 401k.

A Roth IRA is like a Roth 401k, except there is no employer match. You put in after-tax money and withdraw without taxes.

A traditional, deductible IRA is like a traditional 401k, except there is no employer match. You put in pre-tax money and withdraw without taxes. You can also think of it as a Roth IRA but tax-deferred.

A traditional, non-deductible IRA and a traditional, deductible IRA are really the same account (it's just called a traditional IRA). However, depending on your income level, your contributions may or may not be tax-deductible. This difference has big implications, so it's worth treating the two cases separately. So how does this differ from a Roth IRA, since both are funded using after-tax money? In a Roth IRA, you don't pay any taxes at withdrawal. However, in a traditional, non-deductible IRA, you pay income tax on the earnings.

Okay! Now we're ready to figure out how these factors impact our returns. Let's start by defining a few variables
  • \( C \) - pre-tax contribution amount
  • \( PTM \) - present income-tax multiplier, equal to one minus your present tax rate
  • \( RTM \) - retirement income-tax multiplier, equal to one minus your retirement tax rate
  • \( CGR \) - capital gains tax rate
  • \( i \) - annual percentage yield, corresponds to the growth rate or fixed-income yield
  • \( T \) - number of years
  • \( EM \) - employer 401k match percentage
For a regular, taxable account, a pre-tax contribution of \( C \) gets income taxed, meaning that we really only start with \( C \times PTM \) dollars in our account. If we invest in growth stocks, due to compounding growth after \( T \) years, we will have \( C \times PTM \times (1+i)^T \) dollars worth of assets. However, when we sell the assets, the capital gains are taxed at \( CGR \). So we lose \( C \times PTM \times [(1+i)^T - 1] \times CGR \) to taxes. Thus we are left with \[ \boxed{C \times PTM \times [(1-CGT) \times (1+i)^T + CGT]} \] Now let's consider a fixed-income asset in our taxable account. In this situation, we still start with \( C \times PTM \) dollars, but our yield also goes down to \( i \times PTM \). This is becomes the yield is also taxed by income tax, which lowers the compounding rate. This means after \( T \) years, we end up with \[ \boxed{C \times PTM \times (1+ i \times PTM)^T} \] In a tax-advantaged account, like a 401k, no taxes are paid on neither growth nor yield, so the calculations will be identical for growth and fixed-income assets.

In an HSA, things are quite simple since there are no taxes. Your pre-tax contribution \( C \) gets the full benefit of compound growth \( (1+i)^T \), and you get to spend the full resultant amount tax-free on health expenses \[ \boxed{C \times (1+i)^T} \] In a traditional 401k, we get our full pre-tax contribution of \( C \) plus the employer match, so we start off with \( C \times (1+EM) \). We then have compound growth of \( (1+i)^T \) and then at withdrawal we pay an income tax of \( RTM \). This leaves us with \[ \boxed{C \times (1+EM) \times RTM \times (1+i)^T} \] A Roth 401k behaves similarly, except we pay taxes up front on our contributions. So from our contributions, we have \( C \times PTM \times (1+i)^T \). Now for our employer contributions which is treated as a traditional 401k, we have \( C \times EM \times RTM \times (1+i)^T \). This gives us \[ \boxed{C \times (PTM + EM \times RTM) \times (1+i)^T} \] A Roth IRA behaves like a Roth 401k without the employer match portion, so we have \[ \boxed{C \times PTM \times (1+i)^T} \] A traditional, deductible IRA is like the Roth IRA but taxed-deferred. \[ \boxed{C \times RTM \times (1+i)^T} \] A traditional, non-deductible IRA behaves like a regular taxable account, but with a regular income tax instead of a capital gains tax. The account starts off with a taxed-contribution of \( C \times PTM \) and grows tax-free so we get a factor of \( (1+i)^T \). However the earnings are taxed at like income tax at withdrawal time, so we lose \( C \times PTM \times [(1+i)^T - 1] \times RTM \) to taxes. This leaves us with \[ \boxed{C \times PTM \times [RTM \times (1+i)^T + (1-RTM)]} \] I've summarized the results in the following table
Account/Asset Type Withdrawal Amount (\( \times C \))
Regular + Growth \( PTM \times [(1-CGT) \times (1+i)^T + CGT] \)
Regular + Fixed-Income \( PTM \times (1+ i \times PTM)^T \)
HSA \( (1+i)^T \)
Traditional 401k \( (1+EM) \times RTM \times (1+i)^T \)
Roth 401k \( (PTM + EM \times RTM) \times (1+i)^T \)
Roth IRA \( PTM \times (1+i)^T \)
Traditional, deductible IRA \( RTM \times (1+i)^T \)
Traditional, non-deductible IRA \( PTM \times [RTM \times (1+i)^T + (1-RTM)] \)

Just taking a glance at the formulas above, we can draw some obvious conclusions about the tax-effectiveness of various accounts. But to make it even more obvious, let's work out a numerical example based on some reasonable assumptions
  • $1,000 pre-tax contribution amount
  • 28% income tax bracket
  • 28% retirement income tax bracket
  • 15% capital gains tax rate
  • 8% stock growth
  • 3% fixed-income yield
  • 20 year investment horizon
  • 50% employer 401k match
For stocks, we'll see the following result
Account Type Withdrawal Amount Advantage
Regular $2961
HSA $4661 +57%
Roth and Traditional 401k $5034 +70%
Roth and Traditional+Deductible IRA $3356 +13%
Traditional, non-deductible IRA $2618 -11%

And for bonds
Account Type Withdrawal Amount Advantage
Regular $1104
HSA $1806 +64%
Roth and Traditional 401k $1951 +77%
Roth and Traditional+Deductible IRA $1300 +18%
Traditional, non-deductible IRA $1138 +3%

With the formulas and numerical results in mind, we can draw the following conclusions
  • there's a huge advantage in investing in your HSA and 401k accounts.
  • the HSA has the biggest tax-advantage, but the employer match is usually enough to compensate or overtake the HSA.
  • there is a significant tax advantage to be had in your deductible IRA accounts, since you're not paying any tax on the growth or yield.
  • There is no difference between a traditional deductible vs Roth 401k/IRA if your income tax rate stay constant. However a lower retirement tax rate will favor traditional whereas a higher retirement tax rate will favor Roth.
  • A non-deductible traditional IRA performs worse than a regular, taxable account for stocks since capital gains are taxed as ordinary income.
  • The tax advantage is greater for fixed-income assets than for growth assets, because a tax-advantaged account improves the compounding ability of fixed-income assets
Of course, the real-world picture isn't quite so simple. There are restrictions, loopholes, and other considerations when using these tools. This post should convince you of the benefits of thinking about tax advantaged accounts and hopefully provide a basic framework for doing so. A deeper treatise on this will be the subject of a future post.

Monday, February 10, 2014

Front-loading your 401k

The most common way of contributing towards your 401k is by setting aside a percentage of each paycheck. With a bi-weekly paycheck (once every two weeks), to max out the annual contribution limit of $17,500 (as of 2014), you would put in $673.08 per paycheck. While this strategy has many benefits in its simplicity and amortization, it is not the most optimal in terms of maximizing the long-term value of your retirement account.

Time is your most valuable asset in both saving and investing. If you are certain about how much you will contribute this year, then it is better to make that contribution as early on in the year as possible. This will give you a little extra time to let that money grow.

How much growth? Let's compare the two extreme examples: loading your 401k at the beginning of year versus loading it all at the end of the year. The difference between the two is a whole year of compounding. At a 10% growth rate, a front-loading a $10,000 contribution would net you an extra $1,000 by the end of the year. Assuming a consistent growth rate, that extra $1,000 will become over $2,593 in 10 years and over $17,000 in 30 years. And not only that, but you'll be able to reap the same rewards each year.

Example graph of net 401k value using each of the three contribution strategies assuming the same total yearly contributions.

If you compare front-loading to an amortized contribution over the course of a year, the benefit is approximately half of the above - still a very significant amount.

However, there are a few drawbacks that come with this more aggressive strategy:
  1. You must know how much you will contribute ahead of time.
  2. You must have an adequate amount of money saved up at the beginning of the year since your paycheck will be significantly diminished.
  3. Negative economic growth will also be amplified.

Tuesday, January 28, 2014

Optimizing the asset allocation of your portfolio (part 1)

Suppose you have \( n \) investment opportunities, each with its own rate of return distribution. How should you allocate your resources so that you maximize your long-term return?

At first glance, it seems optimal to put everything into the investment with the highest average ROI. It is the best performer after all and so we'd expect it to do just well in the future. The issue with this allocation strategy is that it is highly susceptible to gambler's ruin. That is to say, one bad day or year in that particular investment can completely wipe your whole portfolio out. It is this multiplicative nature of the rate of return that makes investing both a highly lucrative and a highly volatile business.

So what is the correct allocation strategy so that you minimize your risk and maximize your overall return? The answer is in the generalization of the Kelly criterion.

For this first part, let's restrict the problem to that of one investment opportunity. That is to say, you have the choice of what fraction \( f \) of your portfolio to put into this one investment (keeping the rest in cash). It turns out that the optimal solution is of the form \[ f = \frac{\mu}{\sigma^2} \] where \( \mu \) is the mean rate of return and \( \sigma^2 \) is the standard deviation.

Suppose we start out with \( V \) dollars and this investment has a randomly distributed rate of return of \( R \) over a given time period. We wish to find the allocation fraction \( f \) that maximizes our expected long-run rate of return. Let \( r_1, r_2, \dots \) denote the portfolio return for each time period. Then our asset value after \( t \) periods is \[ V_t = V \times (1 + r_1) \times (1 + r_2) \times \dots \times (1 + r_n) \] As usual, multiplication is difficult, so let's maximize the expected log value \[ \log V_t = \log V + \sum_{i=1}^t \log(1 + r_i) \] Taking the expectation of this (letting \( X \) be a random variable representing our portfolio return), we get \[ \begin{align*} E[\log V_t] &= \log V + \sum_{i=1}^t E[\log(1+X)] \\ &= \log V + t \times E[\log(1+X)] \end{align*} \] Since \( \log V \) and \( t \) are constant, we simply need to maximize \( E[\log(1+X)] \). Expressing \( X \) in terms \( f \) and \( R \): \[ \begin{align*} 1 + X &= (1-f) + (1 + R) \times f \\ &= 1 + fR \\ E[\log(1+X)] &= E[\log(1 + fR)] \end{align*} \] To simplify this further, we will use the second-order Taylor expansion of the logarithm \( \log(1+x) = x - \frac{1}{2} x^2 + O(x^3) \). Thus we have that \[ \begin{align*} E[\log(1 + fR)] &= E\left[ fR - \frac{1}{2} (fR)^2 + O((fR)^3) \right] \\ &= E[R] f - \frac{E[R^2]}{2} f^2 + O(f^3) \end{align*} \] To maximize this, we take the derivative with respect to \( f \) and set it equal to 0 \[ \begin{align*} 0 &= \frac{\partial}{\partial f} E[\log(1 + fR)] \\ &= E[R] - E[R^2] f + O(f^2) \end{align*} \] To a first-order approximation, we have that \[ \boxed{f \approx = \frac{E[R]}{E[R^2]}} \] i.e. you should allocate according to the ratio of the first and second raw moments of the distribution of returns. A quick sanity check verifies this approximation since a higher mean and lower variance leads to a higher allocation fraction.

If you have the third-moment, you can solve the quadratic to go up to a second-order approximation.

Also note that there are two other critical points for the boundaries: \( f=0 \) and \( f=1 \), which may be the correct solutions for some extreme distributions.

Monday, January 6, 2014

Dividend Discount Model

This is part of a series on valuation techniques.

The fundamental reason why stocks are a vehicle for investment is that they represent a fractional ownership of a company and thus allow you to partake in that fraction of the profits. These profits, called dividends, are typically distributed once per quarter (i.e. four times a year) and are directly proportional to the number of shares that you own. If we have perfect information of future dividends, then we can compute the present value of a share of the company via discounting.

Suppose I have a constant cost of capital (also called the discount rate) of \(r\), i.e. the opportunity cost of 1 dollar over one year is \(1+r\) dollars. And for simplicity, let's say dividends are distributed yearly, starting tomorrow, at \(D_0, D_1, D_2, \dots\) dollars per share. Then the value (to me) of a share is \[ V = D_0 + \frac{D_1}{1+r} + \frac{D_2}{(1+r)^2} + \dots \] If the dividends are constant at \(D\), then this simplifies to a simple geometric series \[ \begin{align*} V &= D \left(1 + \frac{1}{1+r} + \frac{1}{(1+r)^2} + \dots\right) \\ &= \left(\frac{1}{1 - \frac{1}{1+r}}\right) D \\ &= \boxed{\left(\frac{1+r}{r}\right) D} \end{align*} \] If instead the dividends grow linearly at a rate of \(m\), then we have that \[ V = D + \frac{D+m}{1+r} + \frac{D+2m}{(1+r)^2} + \dots \] Then we use the standard technique for simplifying such expressions \[ \begin{align*} \left(\frac{1}{1+r}\right) V &= \frac{D}{1+r} + \frac{D+m}{(1+r)^2} + \dots \\ \left(1 - \frac{1}{1+r}\right) V &= D + \frac{m}{1+r} + \frac{m}{(1+r)^2} + \dots \\ \left(\frac{r}{1+r}\right) V &= D + \frac{m}{r} \\ V &= \boxed{\left(\frac{1+r}{r}\right) \left(D + \frac{m}{r}\right)} \end{align*} \] Finally, let's consider the case where the dividends grow exponentially at a rate of \(g\) \[ \begin{align*} V &= D + \frac{(1+g) D}{1+r} + \frac{(1+g)^2 D}{(1+r)^2} + \dots \\ &= D \left(1 + \frac{1+g}{1+r} + \frac{(1+g)^2}{(1+r)^2} + \dots \right) \\ &= \boxed{\left(\frac{1+r}{r-g}\right) D} \end{align*} \] It is worth noting that these computations only reflect the value of a stock for a given person's or organization's discount rate. The actual price of a stock is a function of supply and demand, i.e. the distribution of values as computed by everyone in the market.

Furthermore, having perfect knowledge of future dividend distributions is, of course, impossible. However, it can be reasonably approximated for certain classes of stocks, such as blue chips. For example, energy companies like Pepco (POM) and PG&E (PCG) have had very consistent dividends over the course of their lifetimes and can be expected to continue such trends in the future.

Perhaps also of interest, we assumed that the first dividend would be distributed the very next day. This reflects the maximum value of the stock to me. The minimum value is achieved the day after a dividend distribution. And the difference between these two values is given by \(D_i\) (i.e. the value will fall by \(D_i\) after the dividend is distributed). This can give rise to some arbitrage opportunities if the market is inefficient at such pricing.

Wednesday, February 13, 2013

Portfolio Update (6 months later)

My portfolio gains versus the S&P, Dow, and NASDAQ.
It's been about 6 months now after I bought my first round of stocks. Overall, my portfolio has performed consistently well and netted a total return of 12% so far, which amounts to about $600 of passive, tax-free income. In this post, I will summarize the results below with a bit of commentary.

First, let me go over my current portfolio as well as some positions that I've closed since my initial purchase.

Company Ticker Status % Gain
Cisco Systems Inc. CSCO Closed 16%
Citigroup Inc. C Closed 14%
Hewlett-Packard Company HPQ Open -3%
Intel Corporation INTC Open -8%
JetBlue Airways Corporation JBLU Open 21%
JPMorgan Chase & Co. JPM Closed 8%
Knight Capital Group Inc. KCG Open 24%
NRG Energy Inc. NRG Open 17%
Office Depot Inc. ODP Closed 14%
Pepco Holdings, Inc. POM Open 2%
PG&E Corporation PCG Open -7%
Safeway Inc. SWY Open 32%
Staples, Inc. SPLS Closed 9%
Xerox Corporation XRX Open 16%

As you can see, I closed positions in Cisco, Citigroup, JPMorgan Chase, Office Depot, and Staples.

Tuesday, September 18, 2012

Modeling Price Fluctuations

The premise of this post is that the movements in price of a security (e.g. stocks, bonds) can be viewed as a random process. Whether or not this is a valid assumption is somewhat of a philosophical question. The price of a security entirely depends on the factors of supply and demand, which are in turn deterministically governed by a multitude of more subtle factors. But like the outcome of a flip of a coin, which is completely determined by the equations of physics and the parameters of the system, such processes are much to complex to analyze in full generality. As a result, we model it as a stochastic process whose variance comes from all of these latent factors.

An illustration of random walks

Problem Statement and Assumptions

We are given the initial price \(P_0\) and we want to make inferences about the future stock price \(P_T\). The random variables \(P_i\) must also be non-negative. The time scale here is arbitrary and can be made as large or small as necessary.

Our key assumption here is that the changes in price are independent and identically distributed (iid). We characterize the price change as the ratio \[C_i = \frac{P_i}{P_{i-1}}\] Note that we didn't use a straightforward difference (\(P_i-P_{i-1}\)). The reason is because the difference most certainly isn't iid (a price of $1 has support on \([-1,\infty]\) whereas a price of $2 has support on \([-2,\infty]\)). You'll notice that our characterization corresponds to a percentage difference (plus one).

The Normal Distribution

The normal distribution (also known as the bell curve, the Gaussian, etc.) is ubiquitous in modeling random variables. And so it would be reasonable to conjecture that \(P_T\) is normally distributed. \[ f_{\mu,\sigma^2}(x) = \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]
The normal distribution

However in a similar vein as to why we didn't use the difference in price as our characterization of change, the normal distribution doesn't have the correct support. If we had used the distribution as our model, we would have found that the model would assign a positive probability to the future price being less than 0.

Logarithms to the Rescue

Okay, let's actually do the math without resorting to guessing. The price \(P_{1}\) can be expressed as \(C_1 \times P_0\), and \(P_{2}\) as \(C_2 \times P_1\), and so on. Inductively continuing this process yields \[ P_T = C_T C_{T-1} \dots C_1 P_0 \] Thus we have that \(P_T\) is proportional to the product of \(T\) iid random variables. The trick is to turn this product into a sum so then we can apply the central limit theorem. We do this by taking the logarithm of both sides \[ \begin{align*} \log P_T &= \log(C_T C_{T-1} \dots C_1 P_0) \\ &= \log C_T + \log C_{T-1} + \dots + \log C_1 + \log P_0 \\ &\thicksim N(\mu,\sigma^2) \end{align*} \] Since the \(C_i\)s are iid, their logarithms must also be iid. Now we can apply the central limit theorem to see that \(\log P_T\) converges to a normal distribution! The exponential of a normal distribution is known as the log-normal distribution so \(P_T\) is log-normal. \[ g_{\mu,\sigma^2}(x) = \frac{1}{x\sqrt{2\pi \sigma^2}}e^{-\frac{(\log x-\mu)^2}{2\sigma^2}} \]
The log-normal distribution

As a sanity check, we see that the support of the log-normal is on \((0,\infty]\) as expected.

But wait there's more!

In the beginning we noted that the choice of time-scale is arbitrary. By considering smaller time scales, we can view our \(C_i\)s as the product of finer grained ratios. Thus by the same argument as above, each of the \(C_i\)s must also be log-normally distributed.

Experimental Results

I took ~3200 closing stock prices of Microsoft Corporation (MSFT), courtesy of Yahoo! Finance from January 3, 2000 to today. I imported the data set into R and calculated the logarithms of the \(C_i\)s. I then plotted a normalized histogram of the results and overlaid the theoretical normal distribution on top of it. The plot is shown below:


As you can see, the theoretical distribution doesn't fit our data exactly. The overall shape is correct, but our derived distribution puts too little mass in the center and too little on the edges.

We now must go back to our assumptions for further scrutiny. Our main assumption was that the changes are independent and identically distributed. In fact, it has been shown in many research papers (e.g. Schwert 1989) that the changes are not identically distributed, but rather vary over time. However, the central limit theorem is fairly robust in practice. Especially under a sufficiently large of samples, each "new" distribution will eventually sum to normality (and the sum of normal distributions is normal).

I suspect that the deviation from normality is primarily caused by dependence between samples. The heavy tails can be explained by the fact that a large drop/rise in price today may be correlated to another drop/rise in the near future. This is particularly true during times of extreme depression or economic growth. A similar argument can be made about the excess of mass in the center of the distribution. It is conceivable that times of low volatility will be followed by another time of low volatility.


While our model might not be perfect in practice, it is a good first step to developing a better model. I think what you should take from this is that it is important to experimentally verify your models rather than blindly taking your assumptions as ground truths. I'll conclude this post with a few closing remarks:
  • Many people actually do use the normal distribution to model changes in prices despite the obvious objections stated above. One can justify this by noting that the distribution of \(C_i\) in practice is usually close to 0. Thus the first order approximation \(e^x \approx 1+x\) is fairly accurate.
  • The histogram and fit shown above can be reproduced for almost any stock or index (e.g. S&P 500, DJIA, NASDAQ)
  • R is a great piece of software but has god awful tutorials and documentation. I am not in a position to recommend it yet because of this.

Friday, August 31, 2012

Valuation Techniques: Liquidation value

This is part of a series on valuation techniques.

When we talk about the value of a company, there are two fundamental components associated with it: assets and income. Very simplistically, we can view a company as a black box holding assets that grow over time in a stochastic manner.

I will define the liquidation value of a company as the net worth of a company's tangible assets in event of a bankruptcy.

How it useful?

Unfortunately, liquidation value isn't an accurate measurement of the intrinsic value of a company. Then how is it at all useful to an investor?

Neither accuracy nor precision are necessary conditions to make a profit in investing. The only necessary condition to successful investing is arbitrage. As long as we can buy a security for less than what it's worth, a profit can be made. Even if we don't know precisely what a security is worth, we need only to establish sufficiently tight lower bounds on the price to determine if it is a worthwhile investment.

That is exactly what the liquidation value is meant to provide. While it is difficult to predict the future earnings of a company, we still have a lower bound given by what the company currently holds. These figures are reported regularly on the balance sheet in financial statements.

Obligatory Disclaimer

The author is not qualified to give financial, tax, or legal advice and disclaims any and all liability for this information.