Instructions for using the heterogeneity test

First, rename the text file you've downloaded from make_trees.txt to make_trees.pl in order to run it on your computer.

In order to run the program you need to supply four simple numbers:

1. Number of sampled sequences/chromosomes (designated "s")
2. Number of segregating sites of your putatively neutral class (Synonymous mutations would be an obvious choice, but you could use others like non-binding sites in promoters; designated "mut_1")
3. Number of seg. sites of your selected class (usually nonsynonymous, but could be e.g. binding sites; designated "mut_2")
4. Difference in D values.  The perl program will run with either Tajima's D or Fu and Li's D, and you need to get those values for your neutral and selected classes separately and then the difference = (Dneutral - Dselected). (designated "observed")

So the simple command line argument (in Unix) looks like this ("i" is the number of iterations you want it to run):

perl make_trees.pl  -s 10  -mut_1 10  -mut_2 5  -observed 1.2  -i 1000  -method tajimaD

The other method is called "fuD".  The output line will be a bit confusing at first.  The thing you want to know is the "index".  This tells you where in the distribution your observed is closest to (actually, just smaller than).  One output line from the above input looks like this:

index 941 value=1.2099 out of 1000 total (obs=1.2000)

So the p-value is [(total-index)/total] (=.059) for a one-tailed test.

***This test should probably now be considered a two-tailed test.  Whereas the original paper did not consider this biologically plausible, it has now been shown that balancing selection on selected mutations may lead to a significantly higher value of Tajima's or Fu and Li's D for this class of mutations.  This means that p-values should now be computed as [2*(total-index)/total] for the example above and, if you get a value in the other extreme (i.e. the reported "index" is close to zero), the p-value should computed as [2*(index/total)].
 

Make sure you read the paper and know what the program does before running it!