about me
art
biz
Chess
corrections
economics
EconoSchool
Finance
friends
fun
game theory
games
geo
mathstat
misc
NatScience
... more
Profil
Logout
Subscribe Weblog

 

mathstat

A while ago, we adressed the following question: A portfolio manager knows that his strategy can, on average, outperform the benchmark index by 3% annually. His portfolio has an annual volatility (standard deviation) of 25% against the index's 15%. Assuming that the correlation between the returns of the portfolio and the returns of the index is 0.9, how many years would it take to outperform the index with 90% probability?

The correct answer is a whopping 300 years! (apply the Itô-Döblin formula)

Today somebody asked me if I could run a couple of simulations to get a better understanding of the result. What I did was plot 20 simulations of log(Portfolio/Index) for varying correlations. For ρ = 0.9 2 out of 20 (10%) are--as expected--below zero:
skillluckparadox

By strolling through the library I figured out that the number of pages of my master's thesis is five standard deviations lower than the average number of pages my friends have written to graduate from the Vienna University of Economics and Business Administration:

Author Pages Title
Michael Sigmund 139 Anwendungsgebiete der Spieltheorie in den Sozialwissenschaften
Robert Ferstl 127 Werkzeuge zur Analyse räumlicher Daten - eine Softwareimplementation in EViews und MATLAB
Karin Doppelbauer 122 Analiz rʹinka mjasa kur v Rossii - Eine Analyse des russischen Geflügelfleischmarktes
Markus Pock 107 Untersuchungen zu Wachstumseffekten der WWU mittels Zeitreihenanalyse
Christian Kraxner 105 Using credit derivatives for managing corporate bond portfolios
Christian Balbier 104 Föderale Strukturen in den neuen Mitgliedsstaaten der Europäischen Union
Stefan Woytech 103 Harmonisierung von internem und externem Rechnungswesen auf Basis der IAS/IFRS
Anton Burger 92 Reasons for the U.S. Growth Experience in the Nineties: Non Keynesian Effects, the Capital Market and Technology
Michael Stastny 31 Economic Growth and Output Variability: An Empirical Analysis (pdf)

The outlier status of my thesis is confirmed by conventional tests for outliers:
outliers
How cool is that?

The difference between an event being almost sure and sure is the same as the subtle difference between something happening with probability 1 and happening always.

If an event is sure, then it will always happen. No other event (even events with probability 0) can possibly occur. If an event is almost sure, then there are other events that could happen, but they happen almost never, that is with probability 0.

Cool Example: Throwing a dart
dbot
For example, imagine throwing a dart at a square, and imagine that this square is the only thing in the universe. There is physically nowhere else for the dart to land. Then, the event that "the dart hits the square" is a sure event. No other alternative is imaginable.

Next, consider the event that "the dart hits the diagonal of the square exactly". The probability that the dart lands on any subregion of the square is equal to the area of that subregion. But, since the area of the diagonal of the square is zero, the probability that the dart lands exactly on the diagonal is zero. So, the dart will almost surely not land on the diagonal, or indeed any other given line or point. Notice that even though there is zero probability that it will happen, it is still possible.

Source: Wikipedia

There is an old conundrum in queueing theory that goes like this:
  • A passenger arrives at a bus-stop at some arbitrary point in time
  • Buses arrive according to a Poisson process
  • The mean interval between the buses is 10 min.
What is the mean waiting time until the next bus?
busline
Answer: 10 min. This is an example of length-biased sampling. The explanation of the paradox lies therein that the passengers' probability to arrive during a long interarrival interval is greater than during a short interval. ] Here is a neat non-technical explanation (taken from this book). [

Given the interarrival interval, within that interval the arrival instant of the passanger is uniformly distributed and the expected waiting time is one half of the total duration of the interval. The point is that in the selection by the random instant the long intervals are more frequently represented than the short ones (with a weight proportional to the length of the interval).

Consider a long period of time t. The waiting time to the next bus arrival W(τ) as a function of the arrival instant τ of the passenger is represented by:
waittwhere the Xi are the interarrival intervals. The mean waiting time, W_bar, is the average value of this sawtooth curve:waitt01
Note that long interarrival intervals contribute much more than short ones to the average waiting time. As t grows, t/n -> X_bar, hence,
waitt02For exponential distribution (as the Xi are distributed),
waitt03Therefore,
waitt04Altogether,
waitt05
Q.E.D.

Sources:
Advanced Course in Operating Systems (University of Haifa), Lecture 1 & 2
Of Buses and Bunching: Strangeness in the Queue, TeamQuest

New Economist writes: In a new post on the Statistical Modeling, Causal Inference, and Social Science blog, Aleks Jakulin at Columbia University points us to a great online tool, ZunZun. It lets you use 2 and 3 dimensional 'Function Finders' to 'help determine the best curve fit for your data'."

On William Greene's site I found a neat data set (Data Tables :: Table F6.1) for estimating a Cobb-Douglas production function. ZunZun comes up with the following suggestion:

Y = β1( L0.5K0.5) + β2(cos(L)K1.5)

Surface Plot:
cobb_surface
Contour Plot:
cobb_contour
The R2 reaches an unrealistic 0.968 (0.94 for the Cobb-Douglas specification). Textbook data... Here is a scatterplot of the logarithmized data (created with R):
scatter3d

viewp01
viewp02

The Viewpoints 2000 Group
Mathematics Magazine, Vol. 74, No. 4. (Oct., 2001), p. 320.

related items:
Neat Proofs, Mahalanobis

Taken from the bionet.info-theory FAQ: If someone says that information = uncertainty = entropy, then they are confused, or something was not stated that should have been. Those equalities lead to a contradiction, since entropy of a system increases as the system becomes more disordered. So information corresponds to disorder according to this confusion.

If you always take information to be a decrease in uncertainty at the receiver and you will get straightened out:
entro01
where H is the Shannon uncertainty:
entro02
and pi is the probability of the ith symbol. If you don't understand this, read the short Information Theory Primer.

Imagine that we are in communication and that we have agreed on an alphabet. Before I send you a bunch of characters, you are uncertain (Hbefore) as to what I'm about to send. After you receive a character, your uncertainty goes down (to Hafter). Hafter is never zero because of noise in the communication system. Your decrease in uncertainty is the information (I) that you gain.

Since Hbefore and Hafter are state functions, this makes I a function of state. It allows you to lose information (it's called forgetting).

Many of the statements in the early literature assumed a noiseless channel, so the uncertainty after receipt is zero (Hafter=0). This leads to the SPECIAL CASE where I = Hbefore. But Hbefore is NOT "the uncertainty", it is the uncertainty of the receiver BEFORE RECEIVING THE MESSAGE.

e8MIT news office: An international team of 18 mathematicians has mapped one of the largest and most complicated structures in mathematics. If written out on paper, the calculation describing this structure, known as E8, would cover an area the size of Manhattan.

The work is important because it could lead to new discoveries in mathematics, physics and other fields. In addition, the innovative large-scale computing that was key to the work likely spells the future for how longstanding math problems will be solved in the 21st century. Click here (or here ) to continue.

related items:
American Institute of Mathematics: E8
Lecture Slides: The Character Table for E8, or How We Wrote Down a 453,060 x 453,060 Matrix and Found Happiness, David Vogan, MIT

"Multivariate Adaptive Regression Splines (MARS) is an implementation of techniques popularized by Friedman (1991) for solving regression-type problems.

MARS is a nonparametric regression procedure that makes no assumption about the underlying functional relationship between the dependent and [explanatory] variables. Instead, MARS constructs this relation from a set of coefficients and basis functions that are entirely "driven" from the regression data. In a sense, the method is based on the "divide and conquer" strategy, which partitions the input space into regions, each with its own regression equation. This makes MARS particularly suitable for problems with higher input dimensions (i.e., with more than 2 variables), where the curse of dimensionality [see also blessing of dimensionality] would likely create problems for other techniques.

The MARSplines technique has become particularly popular in the area of data mining because it does not assume or impose any particular type or class of relationship (e.g., linear, logistic, etc.) between the predictor variables and the dependent (outcome) variable of interest. Instead, useful models (i.e., models that yield accurate predictions) can be derived even in situations where the relationship between the predictors and the dependent variables is non-monotone and difficult to approximate with parametric models." [Continue]
Note to self: Read Hat tip to Diethelm Würtz

Taylor Effect: It is by now well established in the financial econometrics literature that high frequency time series of financial returns are often uncorrelated but not independent because there are non-linear transformations which are positively correlated. In 1986 Taylor observed that the empirical sample autocorrelations of absolute returns, |r|, are usually larger than those of squared returns, |r|^2. A similar phenomena is observed by Ding et al. (1993) who examined daily returns of the S&P 500 index and conclude that, for this particular series, the autocorrelations of absolute returns raised to the power of θ are maximized when θ is around 1, that is, the largest autocorrelations are found in the absolute returns. Granger and Ding (1995) denote this empirical property of financial returns as Taylor Effect. Therefore, if rt, t = 1,...T, is the series of returns and ρθ(k) denotes the sample autocorrelation of order k of |rt|θ, θ > 0, the Taylor effect can be defined as follows:

ρ1(k) > ρθ(k) for any θ ≠ 1.

However, Granger and Ding (1994, 1996) analyze several series of daily exchange rates and individual stock prices, and conclude that the maximum autocorrelation is not always obtained when θ = 1 but for smaller values of θ. Nevertheless, they point out that the autocorrelations of absolute returns are always larger than the autocorrelations of squares. [1] This can also be observed when looking at USDCHF High Frequency FX rates (1996-04-01 00:00:00 to 2001-03-30 23:30:00; 62,496 observations):
usdchfhist
teffectPlot, k = 1,...,10:
taylor_effect_rmetrics
Scaling Law: Some financial time series show a selfsimilar behavior under temporal aggregation. The 'empirical scaling law' relates the average of the unconditional volatility, measured as the absolute value of the return, r(ti), over a time interval to the size of the time interval:
scalinglawformula
where the drift exponent 1/E is an estimated constant that Müller et al. (1990) find to be similar across different currencies and ΔT is a time constant that depends on the currency [2]. The Wiener process, a continuous Gaussian random walk, exhibits a scaling law with a drift exponent of 0.5 (slope of green line). The estimated drift component for the USDCHF series is 0.52, which is actually not statistically different from 0.5:
scalinglaw
For more information see Fractals and Intrinsic Time - A Challenge to Econometricians.

Here is the official Rmetrics site.

[1]: see Stochastic Volatility Models and the Taylor Effect, Alberto Mora-Galán and Ana Pérez and Esther Ruiz
[2] see The Impact of News on Foreign Exchange Rates: Evidence from High Frequency Data, Dirk Eddelbuettel and Thomas H. McCurdy