R in the news

I’m sure everyone and his brother will be posting a link to this article from Steve McNally’s blog on Forbes. The post contains a lot of interesting links, as do the comments.


Strange random number generator

A random number generator outputs a sequence of n random numbers {r1, r2, … , rn} from a uniform distribution on (0,1). This sequence can be used to define an integer p with n-1 binary digits according to the following algorithm:

     for (i in 1:(n-1))
        if (r[i+1] > r[i])
           then set bit i of p equal to 1
           else set bit i of p equal to 0

Let’s call p the pattern of the sequence. For example, the sequence {0.5703, 0.1617, 0.2629, 0.5404, 0.3860} has pattern 0110; the sequence {0.6441, 0.6715, 0.0802, 0.8980, 0.5447, 0.4748, 0.5214, 0.0110} has pattern 1010010. Since a set of size n corresponds to a pattern of n-1 binary digits, the number of possible patterns is 2n-1.

What does the distribution of p look like? For a fixed n, the probability that a set of n random numbers will correspond to a particular pattern p is given by:

(-1)^w \int_0^1 \int_{p_1}^{\tau_1} \int_{p_2}^{\tau_2} \cdots  \int_{p_{n-1}}^{\tau_{n-1}} d \tau_n d \tau_{n-1} \cdots d \tau_2 d \tau_1

where w is the Hamming weight of p (i.e., the number of 1s it contains).

Here is some Mathematica code to generate the distribution for a given value of n:

n = -1;
While[(n < 1) || (n > 12),
  n = Input["Enter number of binary digits: "]];
func = {1};
ProgressIndicator[Dynamic[pr], {1, n}]
For[i = 1, i ≤ n, i++,
  pr = i;
  fl = Integrate[func, {x, 0, x}];
  fr = Integrate[func, {x, x, 1}];
  func = Join[fl, fr];];
freq = Integrate[func, {x, 0, 1}];

Here is a barchart of the frequency distribution with n=6. The x-axis runs from 00000 to 11111.

The distribution is self-similar, which I guess makes it a fractal.

A Review of VMware Fusion

I use Parallels on my iMac at work, and have had a few problems with it: the display sometimes flickers when I’m running Windows, and once I lost the Windows partition altogether and had to reinstall. So when I purchased a new Mac mini for use at home, I decided to try VMware Fusion. I have not been disappointed.

Based on recommendations I read on the web, I chose to install Windows XP using BootCamp, then installed Fusion and set it to use the BootCamp partition. The main advantage to this arrangement is that the machine can be booted directly into Windows, bypassing Fusion altogether if need be. The disadvantage is that you cannot suspend a Windows session.

The manual warned that when installed this way, Windows would have to be reactivated the first time it was booted under Fusion. Although it was a bit scary to see XP “broken” — I had just paid $180 for it, after all — the reactivation process was relatively painless: I called Microsoft via a toll-free number and read the product code to their voice-recognition system. An automated voice read out a new code, which I copied down and entered into my computer. With this new code, Windows booted up without a hitch, and has been running well ever since.

My impression is that Windows “feels” snappier under Fusion than under Parallels — this despite the fact that I’m running Fusion on a mini, which is a slower machine than the iMac at work where I run Parallels. Fusion also seems to be more stable. Using it, I have never had Windows lock up, and there have been no problem with display flicker, regardless of whether I’m running Windows in full-screen mode or in a window. All in all, I think Fusion is a great product and worth the money. One of these days I may even decide to install a version of Linux on my mini.

Baseball simulation

Interesting article in the Times the other day about using simulation to answer questions of strategy in baseball.

Under what conditions is bunting advantageous? When does trying to steal make sense, and when does it decrease the chances of scoring? Questions like these turn out to be ideally suited to computer programs through which millions of iterations can smooth out the peaks and valleys of randomness, and converge toward a reliable approximation.

They mention a simulation program called Diamond Mind that can be used to answer such questions.

Mathematica Home Edition

Wow. I am going to have a hard time resisting the temptation to spend $295 on this. At work we have a site license for the professional version, but I’ve never had the time to learn how to use it.

Q: How is Mathematica Home Edition different from the professional version of Mathematica?
Mathematica Home Edition is a fully functional version of Mathematica Professional with the same features.

Data Analysis Applications for MacOSX

Mac users have a wide variety of tools available for data analysis. Via Macs in Chemistry, here is a software list. While there are some very expensive programs on the list (such as Mathematica), the median price looks to be zero!

Many of the programs linked above allow you to produce two-dimensional graphs from a list of datapoints. But what if you have a graph and want to determine the datapoints that were used to produce it? This could be thought of as the inverse of graphing data. To do this you need a program like GraphClick, which at $8 is a real bargain. I have found it to be very useful, since few academic papers make their data available.

R in the news

Great article today about R in the New York Times. I use this program every day and it is indispensable to me.

UPDATE: The author of the original Times article posted some follow-up on the Bits Blog.

Also, this is old, but I just saw it yesterday. Back in October Ian Ayres, the author of Supercrunchers, was guestblogging on the Freakonomics blog. In this post (which otherwise contains some very interesting material) he was discussing free software for number crunching and made an ignorant remark about my favorite program, R. Numerous indignant commenters appeared to set him straight.