Monday, February 10, 2014

Front-loading your 401k

The most common way of contributing towards your 401k is by setting aside a percentage of each paycheck. With a bi-weekly paycheck (once every two weeks), to max out the annual contribution limit of $17,500 (as of 2014), you would put in $673.08 per paycheck. While this strategy has many benefits in its simplicity and amortization, it is not the most optimal in terms of maximizing the long-term value of your retirement account.

Time is your most valuable asset in both saving and investing. If you are certain about how much you will contribute this year, then it is better to make that contribution as early on in the year as possible. This will give you a little extra time to let that money grow.

How much growth? Let's compare the two extreme examples: loading your 401k at the beginning of year versus loading it all at the end of the year. The difference between the two is a whole year of compounding. At a 10% growth rate, a front-loading a $10,000 contribution would net you an extra $1,000 by the end of the year. Assuming a consistent growth rate, that extra $1,000 will become over $2,593 in 10 years and over $17,000 in 30 years. And not only that, but you'll be able to reap the same rewards each year.

Example graph of net 401k value using each of the three contribution strategies assuming the same total yearly contributions.

If you compare front-loading to an amortized contribution over the course of a year, the benefit is approximately half of the above - still a very significant amount.

However, there are a few drawbacks that come with this more aggressive strategy:
  1. You must know how much you will contribute ahead of time.
  2. You must have an adequate amount of money saved up at the beginning of the year since your paycheck will be significantly diminished.
  3. Negative economic growth will also be amplified.

Tuesday, January 28, 2014

Optimizing the asset allocation of your portfolio (part 1)

Suppose you have \( n \) investment opportunities, each with its own rate of return distribution. How should you allocate your resources so that you maximize your long-term return?

At first glance, it seems optimal to put everything into the investment with the highest average ROI. It is the best performer after all and so we'd expect it to do just well in the future. The issue with this allocation strategy is that it is highly susceptible to gambler's ruin. That is to say, one bad day or year in that particular investment can completely wipe your whole portfolio out. It is this multiplicative nature of the rate of return that makes investing both a highly lucrative and a highly volatile business.

So what is the correct allocation strategy so that you minimize your risk and maximize your overall return? The answer is in the generalization of the Kelly criterion.

For this first part, let's restrict the problem to that of one investment opportunity. That is to say, you have the choice of what fraction \( f \) of your portfolio to put into this one investment (keeping the rest in cash). It turns out that the optimal solution is of the form \[ f = \frac{\mu}{\sigma^2} \] where \( \mu \) is the mean rate of return and \( \sigma^2 \) is the standard deviation.

Suppose we start out with \( V \) dollars and this investment has a randomly distributed rate of return of \( R \) over a given time period. We wish to find the allocation fraction \( f \) that maximizes our expected long-run rate of return. Let \( r_1, r_2, \dots \) denote the portfolio return for each time period. Then our asset value after \( t \) periods is \[ V_t = V \times (1 + r_1) \times (1 + r_2) \times \dots \times (1 + r_n) \] As usual, multiplication is difficult, so let's maximize the expected log value \[ \log V_t = \log V + \sum_{i=1}^t \log(1 + r_i) \] Taking the expectation of this (letting \( X \) be a random variable representing our portfolio return), we get \[ \begin{align*} E[\log V_t] &= \log V + \sum_{i=1}^t E[\log(1+X)] \\ &= \log V + t \times E[\log(1+X)] \end{align*} \] Since \( \log V \) and \( t \) are constant, we simply need to maximize \( E[\log(1+X)] \). Expressing \( X \) in terms \( f \) and \( R \): \[ \begin{align*} 1 + X &= (1-f) + (1 + R) \times f \\ &= 1 + fR \\ E[\log(1+X)] &= E[\log(1 + fR)] \end{align*} \] To simplify this further, we will use the second-order Taylor expansion of the logarithm \( \log(1+x) = x - \frac{1}{2} x^2 + O(x^3) \). Thus we have that \[ \begin{align*} E[\log(1 + fR)] &= E\left[ fR - \frac{1}{2} (fR)^2 + O((fR)^3) \right] \\ &= E[R] f - \frac{E[R^2]}{2} f^2 + O(f^3) \end{align*} \] To maximize this, we take the derivative with respect to \( f \) and set it equal to 0 \[ \begin{align*} 0 &= \frac{\partial}{\partial f} E[\log(1 + fR)] \\ &= E[R] - E[R^2] f + O(f^2) \end{align*} \] To a first-order approximation, we have that \[ \boxed{f \approx = \frac{E[R]}{E[R^2]}} \] i.e. you should allocate according to the ratio of the first and second raw moments of the distribution of returns. A quick sanity check verifies this approximation since a higher mean and lower variance leads to a higher allocation fraction.

If you have the third-moment, you can solve the quadratic to go up to a second-order approximation.

Also note that there are two other critical points for the boundaries: \( f=0 \) and \( f=1 \), which may be the correct solutions for some extreme distributions.

Monday, January 6, 2014

Dividend Discount Model

This is part of a series on valuation techniques.

The fundamental reason why stocks are a vehicle for investment is that they represent a fractional ownership of a company and thus allow you to partake in that fraction of the profits. These profits, called dividends, are typically distributed once per quarter (i.e. four times a year) and are directly proportional to the number of shares that you own. If we have perfect information of future dividends, then we can compute the present value of a share of the company via discounting.

Suppose I have a constant cost of capital (also called the discount rate) of \(r\), i.e. the opportunity cost of 1 dollar over one year is \(1+r\) dollars. And for simplicity, let's say dividends are distributed yearly, starting tomorrow, at \(D_0, D_1, D_2, \dots\) dollars per share. Then the value (to me) of a share is \[ V = D_0 + \frac{D_1}{1+r} + \frac{D_2}{(1+r)^2} + \dots \] If the dividends are constant at \(D\), then this simplifies to a simple geometric series \[ \begin{align*} V &= D \left(1 + \frac{1}{1+r} + \frac{1}{(1+r)^2} + \dots\right) \\ &= \left(\frac{1}{1 - \frac{1}{1+r}}\right) D \\ &= \boxed{\left(\frac{1+r}{r}\right) D} \end{align*} \] If instead the dividends grow linearly at a rate of \(m\), then we have that \[ V = D + \frac{D+m}{1+r} + \frac{D+2m}{(1+r)^2} + \dots \] Then we use the standard technique for simplifying such expressions \[ \begin{align*} \left(\frac{1}{1+r}\right) V &= \frac{D}{1+r} + \frac{D+m}{(1+r)^2} + \dots \\ \left(1 - \frac{1}{1+r}\right) V &= D + \frac{m}{1+r} + \frac{m}{(1+r)^2} + \dots \\ \left(\frac{r}{1+r}\right) V &= D + \frac{m}{r} \\ V &= \boxed{\left(\frac{1+r}{r}\right) \left(D + \frac{m}{r}\right)} \end{align*} \] Finally, let's consider the case where the dividends grow exponentially at a rate of \(g\) \[ \begin{align*} V &= D + \frac{(1+g) D}{1+r} + \frac{(1+g)^2 D}{(1+r)^2} + \dots \\ &= D \left(1 + \frac{1+g}{1+r} + \frac{(1+g)^2}{(1+r)^2} + \dots \right) \\ &= \boxed{\left(\frac{1+r}{r-g}\right) D} \end{align*} \] It is worth noting that these computations only reflect the value of a stock for a given person's or organization's discount rate. The actual price of a stock is a function of supply and demand, i.e. the distribution of values as computed by everyone in the market.

Furthermore, having perfect knowledge of future dividend distributions is, of course, impossible. However, it can be reasonably approximated for certain classes of stocks, such as blue chips. For example, energy companies like Pepco (POM) and PG&E (PCG) have had very consistent dividends over the course of their lifetimes and can be expected to continue such trends in the future.

Perhaps also of interest, we assumed that the first dividend would be distributed the very next day. This reflects the maximum value of the stock to me. The minimum value is achieved the day after a dividend distribution. And the difference between these two values is given by \(D_i\) (i.e. the value will fall by \(D_i\) after the dividend is distributed). This can give rise to some arbitrage opportunities if the market is inefficient at such pricing.

Wednesday, February 13, 2013

Portfolio Update (6 months later)

My portfolio gains versus the S&P, Dow, and NASDAQ.
It's been about 6 months now after I bought my first round of stocks. Overall, my portfolio has performed consistently well and netted a total return of 12% so far, which amounts to about $600 of passive, tax-free income. In this post, I will summarize the results below with a bit of commentary.

First, let me go over my current portfolio as well as some positions that I've closed since my initial purchase.

Company Ticker Status % Gain
Cisco Systems Inc. CSCO Closed 16%
Citigroup Inc. C Closed 14%
Hewlett-Packard Company HPQ Open -3%
Intel Corporation INTC Open -8%
JetBlue Airways Corporation JBLU Open 21%
JPMorgan Chase & Co. JPM Closed 8%
Knight Capital Group Inc. KCG Open 24%
NRG Energy Inc. NRG Open 17%
Office Depot Inc. ODP Closed 14%
Pepco Holdings, Inc. POM Open 2%
PG&E Corporation PCG Open -7%
Safeway Inc. SWY Open 32%
Staples, Inc. SPLS Closed 9%
Xerox Corporation XRX Open 16%

As you can see, I closed positions in Cisco, Citigroup, JPMorgan Chase, Office Depot, and Staples.

Tuesday, September 18, 2012

Modeling Price Fluctuations

The premise of this post is that the movements in price of a security (e.g. stocks, bonds) can be viewed as a random process. Whether or not this is a valid assumption is somewhat of a philosophical question. The price of a security entirely depends on the factors of supply and demand, which are in turn deterministically governed by a multitude of more subtle factors. But like the outcome of a flip of a coin, which is completely determined by the equations of physics and the parameters of the system, such processes are much to complex to analyze in full generality. As a result, we model it as a stochastic process whose variance comes from all of these latent factors.

An illustration of random walks

Problem Statement and Assumptions

We are given the initial price \(P_0\) and we want to make inferences about the future stock price \(P_T\). The random variables \(P_i\) must also be non-negative. The time scale here is arbitrary and can be made as large or small as necessary.

Our key assumption here is that the changes in price are independent and identically distributed (iid). We characterize the price change as the ratio \[C_i = \frac{P_i}{P_{i-1}}\] Note that we didn't use a straightforward difference (\(P_i-P_{i-1}\)). The reason is because the difference most certainly isn't iid (a price of $1 has support on \([-1,\infty]\) whereas a price of $2 has support on \([-2,\infty]\)). You'll notice that our characterization corresponds to a percentage difference (plus one).

The Normal Distribution

The normal distribution (also known as the bell curve, the Gaussian, etc.) is ubiquitous in modeling random variables. And so it would be reasonable to conjecture that \(P_T\) is normally distributed. \[ f_{\mu,\sigma^2}(x) = \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]
The normal distribution

However in a similar vein as to why we didn't use the difference in price as our characterization of change, the normal distribution doesn't have the correct support. If we had used the distribution as our model, we would have found that the model would assign a positive probability to the future price being less than 0.

Logarithms to the Rescue

Okay, let's actually do the math without resorting to guessing. The price \(P_{1}\) can be expressed as \(C_1 \times P_0\), and \(P_{2}\) as \(C_2 \times P_1\), and so on. Inductively continuing this process yields \[ P_T = C_T C_{T-1} \dots C_1 P_0 \] Thus we have that \(P_T\) is proportional to the product of \(T\) iid random variables. The trick is to turn this product into a sum so then we can apply the central limit theorem. We do this by taking the logarithm of both sides \[ \begin{align*} \log P_T &= \log(C_T C_{T-1} \dots C_1 P_0) \\ &= \log C_T + \log C_{T-1} + \dots + \log C_1 + \log P_0 \\ &\thicksim N(\mu,\sigma^2) \end{align*} \] Since the \(C_i\)s are iid, their logarithms must also be iid. Now we can apply the central limit theorem to see that \(\log P_T\) converges to a normal distribution! The exponential of a normal distribution is known as the log-normal distribution so \(P_T\) is log-normal. \[ g_{\mu,\sigma^2}(x) = \frac{1}{x\sqrt{2\pi \sigma^2}}e^{-\frac{(\log x-\mu)^2}{2\sigma^2}} \]
The log-normal distribution

As a sanity check, we see that the support of the log-normal is on \((0,\infty]\) as expected.

But wait there's more!

In the beginning we noted that the choice of time-scale is arbitrary. By considering smaller time scales, we can view our \(C_i\)s as the product of finer grained ratios. Thus by the same argument as above, each of the \(C_i\)s must also be log-normally distributed.

Experimental Results

I took ~3200 closing stock prices of Microsoft Corporation (MSFT), courtesy of Yahoo! Finance from January 3, 2000 to today. I imported the data set into R and calculated the logarithms of the \(C_i\)s. I then plotted a normalized histogram of the results and overlaid the theoretical normal distribution on top of it. The plot is shown below:


As you can see, the theoretical distribution doesn't fit our data exactly. The overall shape is correct, but our derived distribution puts too little mass in the center and too little on the edges.

We now must go back to our assumptions for further scrutiny. Our main assumption was that the changes are independent and identically distributed. In fact, it has been shown in many research papers (e.g. Schwert 1989) that the changes are not identically distributed, but rather vary over time. However, the central limit theorem is fairly robust in practice. Especially under a sufficiently large of samples, each "new" distribution will eventually sum to normality (and the sum of normal distributions is normal).

I suspect that the deviation from normality is primarily caused by dependence between samples. The heavy tails can be explained by the fact that a large drop/rise in price today may be correlated to another drop/rise in the near future. This is particularly true during times of extreme depression or economic growth. A similar argument can be made about the excess of mass in the center of the distribution. It is conceivable that times of low volatility will be followed by another time of low volatility.


While our model might not be perfect in practice, it is a good first step to developing a better model. I think what you should take from this is that it is important to experimentally verify your models rather than blindly taking your assumptions as ground truths. I'll conclude this post with a few closing remarks:
  • Many people actually do use the normal distribution to model changes in prices despite the obvious objections stated above. One can justify this by noting that the distribution of \(C_i\) in practice is usually close to 0. Thus the first order approximation \(e^x \approx 1+x\) is fairly accurate.
  • The histogram and fit shown above can be reproduced for almost any stock or index (e.g. S&P 500, DJIA, NASDAQ)
  • R is a great piece of software but has god awful tutorials and documentation. I am not in a position to recommend it yet because of this.

Friday, August 31, 2012

Valuation Techniques: Liquidation value

This is part of a series on valuation techniques.

When we talk about the value of a company, there are two fundamental components associated with it: assets and income. Very simplistically, we can view a company as a black box holding assets that grow over time in a stochastic manner.

I will define the liquidation value of a company as the net worth of a company's tangible assets in event of a bankruptcy.

How it useful?

Unfortunately, liquidation value isn't an accurate measurement of the intrinsic value of a company. Then how is it at all useful to an investor?

Neither accuracy nor precision are necessary conditions to make a profit in investing. The only necessary condition to successful investing is arbitrage. As long as we can buy a security for less than what it's worth, a profit can be made. Even if we don't know precisely what a security is worth, we need only to establish sufficiently tight lower bounds on the price to determine if it is a worthwhile investment.

That is exactly what the liquidation value is meant to provide. While it is difficult to predict the future earnings of a company, we still have a lower bound given by what the company currently holds. These figures are reported regularly on the balance sheet in financial statements.

Valuation Techniques

Investing ultimately comes down to the ability of an individual to value a company and the associated risks of such a valuation. There are hundreds of different models and techniques in use today, both simple ones devised by humans and enormously complex ones used in algorithmic trading.

I don't believe that there is one magic, all-encompassing, algorithm that will perform optimally in all (or even most) scenarios. The issue is that every company has a different business model, each of which would require a different model for valuation. This task is rather intractable, and so it is the job of the investor to be able to create robust models that hold under reasonable approximations. In addition, he must be able to understand which approximations hold for which businesses, and use the appropriate model.

Many present models involve looking at one aspect of a business, such as dividends, cash flow, earnings, etc. From these, you can derive dividend discount model, discounted cash flow, and P/E relative valuations, respectively. But naturally, these are rather crude in the sense that they don't look at all of the variables. And combining different valuation schemes is a non-trivial process.

In this series of articles, I will start with a set of assumptions and derive some models for valuation using these approximations. This post will be edited as articles are added.
  1. Liquidation value
  2. Dividend Discount Model

Obligatory Disclaimer

The author is not qualified to give financial, tax, or legal advice and disclaims any and all liability for this information.