Excerpt from File = PHYSIG.DOC version of 22 March 2013... a work in progress ...
Partial Documentation for:
PHYSIG Matlab Programs
Matlab code for the phylogenetic analysis of comparative data using Generalized
Least Squares
and computer-intensive methods.
PHYSIG.M (latest version is 27March 2016) Accompanies:
Blomberg, S. P., T. Garland, Jr., and A. R. Ives. 2003. Testing for phylogenetic signal in comparative data:
behavioral traits are more labile. Evolution
57:717-745. [PDF file]
REGRESSIONv2.Mb> (latest version is 27 March 2016) accompanies:
Lavin, S. R., W. H. Karasov, A. R. Ives, K. M. Middleton, and T. Garland,
Jr. 2008.
Morphometrics of the avian small intestine, compared with non-flying mammals:
a phylogenetic approach. Physiological and Biochemical
Zoology 81:526-550.
[PDF file]
The various modules of PHYSIG can be obtained from Ted Garland on request.
Just send me an email!
Theodore Garland, Jr.
Department of Evolution, Ecology, and Organismal Biology
University of California
Riverside, CA 92521
Phone: (951) 827-3524 = Ted's office (with answering machine)
Phone: (951) 827-5903 = Dept. office
Facsimile: (951) 827-4286 = Dept. office
E-mail: tgarland@ucr.edu
http://www.biology.ucr.edu/
http://www.biology.ucr.edu/people/faculty/Garland/Garland2.html
PDF files of my papers can be found here:
http://www.biology.ucr.edu/people/faculty/Garland/GarlandPublications.html
List of Computer Programs in
MatLab (and 1 in Xlisp-Stat)
(Figs. refer to Blomberg
et al., 2003).
PHYSIG.DOC: Documentation as Microsoft Word file.
PHYSIG.M (latest version is 27March 2016): Tests for statistical significance of phylogenetic signal
(tendency for related species to resemble each other) by randomization procedure.
Computes K statistic, which indicates amount of phylogenetic signal relative
to expectation under Brownian motion character evolution along the specified
phylogenetic
tree (topology plus branch lengths. Also computes other
MSE
statistics.
Allows
transformation
of
tree
prior
to analyses, by user-specified value of OU or ACDC parameter.
Uses ACDC.m, ACDCDetNE1.m, mse0ratio.m, mseStarratio.m, OU.m, OUDetNE1.m,
PHYSIGfunct.m, and scalebydet.m, which must
be in the same folder as PHYSIG.M.
PHYSIGOU.M: Maximum likelihood estimate of OU transformation
parameter (d), randomization test for whether d differs significantly from
zero, MSE statistics.
Uses GetMSEOUfunct.m, mseratio.m, PHYSIGfunct.m, PHYSIGOUfunct.m, scalebydet.m,
and tolerance.m.
PHYSIGACDC.M: Maximum likelihood estimate of ACDC parameter (g),
randomization test for whether g differs significantly from zero, MSE statistics.
Uses GetMSEACDCfunct.m, mseratio.m, PHYSIGACDCfunct.m, PHYSIGfunct.m,
scalebydet.m, and tolerance.m.
PHYOUH0d.M: Tests null hypothesis that d = user-specified value
(typically 1), via a randomization test.
Uses GetMSEOUfunct.m, OU.m, PHYSIGfunct.m, PHYSIGOUfunct.m, and tolerance.m.
PHYACDCH0g.M: Tests null hypothesis that g = a user-specified
value (typical 1), via a randomization test.
Uses ACDC.m,GetMSEACDCH0gfunct.m, PHYSIGACDCfunct.m, and PHYSIGfunct.m.
PHYSIGER.M: Mainly a research tool. Analyzes simulated data from
PDSIMUL.EXE of the PDAP package to calculate Type I or Type II error rates
of the randomization test that is implemented by PHYSIG.M.
Uses PHYSIGfunct.m
PHYSIGDG.M: Mainly a research tool. Analyzes simulated data from
PDSIMUL.EXE to estimate d and g, as well as computation of MSE statistics.
Uses mseratio.m, PHYSIGACDCfunct.m, PHYSIGfunct.m, and PHYSIGOUfunct.m.
REGRESSION.M: Computes simple and multiple regressions via the method
of Generalized Least Squares (GLS), e.g., as described in:
Garland, T., Jr., and A. R. Ives. 2000. Using the past to predict the present:
Confidence intervals for regression
equations in phylogenetic comparative methods.
American Naturalist 155:346-364.
Rohlf, F. J. 2001. Comparative methods for the analysis of continuous variables:
geometric interpretations.
Evolution 55:2143-2160.
This program also outputs the conventional (nonphylogenetic) versions of the
analyses.
Note
also that
if
you
perform
these
analyses
on
a
star
phylogeny
then
all
results
will be identical to what you would obtain via a conventional (nonphylogenetic)
regression
analysis. This is a good reality check!
Beginning in January 2008, a newer version named REGRESSIONv2.M is
available. It allows simultaneous estimation of branch length transformations
(transformation of the
phylogenetic variance-covariance matrix) under the OU and ACDC models, as well
as via Grafen's rho and Pagel's lambda. It accompanies this paper:
Lavin, S. R., W. H. Karasov, A. R. Ives, K. M. Middleton, and T. Garland, Jr.
2008. Morphometrics of the avian small intestine, compared with non-flying
mammals: a phylogenetic approach. Physiological and Biochemical
Zoology 81:526-550. [PDF file]
This program reports both likelihoods and AIC values, thus facilitating a "model
selection" approach.
The latest version is 27 March 2016.
TREECONV.LSP: Converts phylogenetic variance-covariance matrix to bracket format phylogenetic tree and vice versa. Particularly useful for visualizing results of OU and ACDC transformations (e.g., see Figs.
1, 8).
General Instructions
Most of the programs take as input two ASCII (plain text) files.
1. Tip data file. One contains the tip (comparative) data to be analyzed. The file can be as simple as a single column of numbers, representing data for one trait (e.g., body size). It can contain multiple columns of trait data, and it can also include tip names in the first column.
One simple way to create a tip data file is to cut and paste the bottom block from a PDI file format from PDAP, a DOS set of programs that is also available from Ted Garland.
2. Phylogenetic variance-covariance matrix. This is a square matrix. Diagonals represent the branch-length distance from root to each tip. Off-diagonals represent the branch-length distance from the root to the last common ancestor of each pair of tips.
One way to create this matrix is with the PDDIST.EXE program of PDAP:
b. Choose M for matrix output.
c. Use of a header it optional (the MatLab code does not use it if it is there).
d. Do not scale all values.
e. Do not write in a compact format without exponents unless all of your branch lengths are in whole numbers or at least have few decimal places.
Crucial: the rows in the tip data file must be
in the same order as the left-to-right (and top-to-bottom) order of the
phylogenetic matrix!!! Otherwise, all results will be nonsense.
Garbage in, garbage out. Note that PDTREE and PDDIST always saves PDI and
DSC files in this form, so it is convenient to use these two programs in
concert to create your tip data and phylogenetic matrices.
As example data sets, seven files are included:
49LBR70.PDI
This is the original PDTREE.EXE data file (tree and tip data) from the following
paper:
Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. Phylogenetic
analysis of covariance by computer simulation. Systematic Biology 42:265-292.
Trait one is log10 body mass (kg) and trait two is log10 home range area (km2),
as taken from their Table 1. Branch lengths in the original tree (file 49LBR.PDI)
were in years and hence very large numbers. This can cause numerical problems
when inverting matrices, etc. Therefore, the present file has been rescaled
in PDTREE.EXE such that branch lengths are in millions of years and the total
height of the tree is thus 70 rather than 70,000,000.
49LBR_1.PDI
This is the above tree but with all branch lengths set to be unity (1.00).
You can do this easily in our PDTREE.EXE program by selecting "(C)onstant" branch
lengths.
49LBR70.DSC
This is the phylogenetic variance-covariance matrix produced by PDDIST.EXE
(again, plain ASCII text).
49LBR_1.DSC
This is the phylogenetic variance-covariance matrix produced by PDDIST.EXE
for the tree with all branch lengths set to be constant (1.00).
49STAR.DSC
This is the phylogenetic variance-covariance matrix produced by PDDIST.EXE
for the tree collapsed to be star (no hierarchical structure) and with all
terminal branch lengths then set to be constant (1.00).
49LBR.TIP
This is just the bottom data block cut out from 49LBR.PDI and saved as plain
ASCII text. The first column is the species designations as 2-character codes;
the second column is log10 body mass (kg); the third column is log10 home
range area (km2).
49.TIP
This is like 49LBR.TIP except that it also contains:
log10 hind leg length in the 4th column;
log10 front leg length in the 5th column;
metatarsal/femur ratio in the 6th column
metacarpal/humerus ratio in the 7th column;
log10 group size in the 8th column;
log10 maximal sprint speed in the 9th column.
These other traits, except for group size, are from this paper:
Garland, T., Jr., and C. M. Janis. 1993. Does metatarsal/femur ratio predict
maximal running speed in cursorial mammals? J. Zool., Lond. 229:133-151.
The DSC and TIP files can be fed directly into the various PHYSIG modules (including
REGRESSION.M). Results (e.g., the K statistic) can be compared with what
is shown in the Appendix of Blomberg et al. (2003).
PHYSIG.M
Start with this program. It will perform the randomization test for phylogenetic signal and also calculate the K statistic to gauge how much phylogenetic signal is present.
After this, try PHYSIGOU.M and PHYSIGACDC.M.
PHYSIG_LL.M:
Newer version of PHYSIG.M (August 2005) that includes calculation of ln likelihood for how well the tip data are fit by the specified phylogeny versus a star phylogeny.
PHYSIGER.M
PHYSIGER.M takes as input from the user a single .DSC matrix (from PDDIST.EXE or other suitable program), which indicates the expected variance-covariance matrix of the tip data given the specified phylogeny, and a .SIM file from program PDSIMUL.EXE of the PDAP package, which will include multiple sets of data simulated along the specified phylogeny. Depending on the .SIM file and .DSC matrix, PHYSIGER.M can be used to calculate the Type I error rate or the power of the randomization test for phylogenetic signal (as implemented in PHYSIG.M).
Type I error rate:
Type I error rate is the probability of rejecting
the null hypothesis when the null hypothesis is actually true. To create
data with no phylogenetic signal, one can simulate data along a star phylogeny.
These data can then be analyzed with PHYSIGER.M, but with a .DSC matrix
that corresponds to a hierarchical phylogeny with the same number of tips.
Given that the data possess no phylogenetic signal, we should not detect
signal, irrespective of what sort of tree we use to analyze the data. Occasionally,
however, we will detect signal, but this should occur only about 5% of
the time if Type I error rate is putatively 0.05.
PHYSIGER.M will analyze each of the simulated data
sets by the same permutation test as in PHYSIG.M. (One may wish to do only
100 permutations for each of the simulated data sets, in order to save
time.) PHYSIGER.M will output the proportion of simulated data sets for
which 5% or fewer of the permuted MSEs are < the original MSE; that
is, the proportion of simulated data sets for which we seem to have detected
signal, even though none should exist on average because they were created
by simulation along a star phylogeny.
Power:
Inputting data that have been simulated along a hierarchical
phylogeny allows one to determine the power of the permutation test implemented
in PHYSIG.M. The statistical power of a test is the probability of rejecting
the null hypothesis when it is in fact false (power = 1 – Type II error
rate). The null hypothesis is no phylogenetic signal. Therefore, we can
create data for which the null hypothesis is false by simulating data along
a phylogeny that is hierarchical. The more hierarchical the tree, the more
signal will be present (on average). For a given shape tree, the larger
the tree the more signal will tend to be present (see Blomberg et al.,
2003). Thus, the larger the tree of a given shape, the greater the power
to detect signal.
The number of simulated data sets, and the number
of permutations for each test are supplied by the user. Each simulated
data set is treated as an "original" or "true" data set. PHYSIGER.M calculates
the MSE for that data set and tree (the same tree [matrix] is used throughout).
Then the data are permuted on the tree a large number of times (usually
1,000) and an MSE is calculated for each permutation. If, for any one simulation,
the original MSE is less than 95% of the permuted MSEs (i.e., original
MSE is significantly less than the permuted MSEs), then that fact is tallied.
The program outputs the proportion of simulated data sets which showed
significant results. A one-tailed test is given for the proportion of simulations
which were significant at the 5% level. A two-tailed test is given by recording
the number of simulations for which the original MSE was less than 2.5%
of the permuted MSEs or greater than 97.5% of the permuted MSEs.
The output for PHYSIGER.M consists of a histogram
of the distribution of permuted MSEs for each simulated data set (they
flash on the screen one after the other), and a statement as to the number
of MSEs which were less than the original for each simulation. At the end
of the run, the proportion of simulated data sets for which the original
MSE was less than 5% of the permuted MSEs for each simulation is given.
Similarly, for the two-tailed test, the proportion of simulated data sets
where the MSE was less than 2.5% of the permuted MSEs or greater than 97.5%
of the permuted MSEs is given, and the two-tailed error rate is simply
the sum of these last two values.
What we did in Blomberg et al. (2003):
We measured Type I error rate using a 32-tip tree.
We simulated data along S32star.pdi, then analyzed those data with three
hierarchical trees, S32.DSC, C32.DSC, C32ONE.DSC. We used PHYSIGER.M to
calculate the power of the permutation test for different sample sizes
(8, 16, 32, etc.) and different tree topologies (symmetrical, ladder-shaped,
ladder-shaped with all branch lengths = 1). We also examined the effect
of the OU parameter (effect of "starness") on power by transforming 49lbr.inp
(from Garland et al., 1993) using different values of d and measuring the
power for each tree. Power increases rapidly with sample size and symmetrical
trees yield slightly higher power (Fig. 2). Power also increases as "starness"
decreases and the tree becomes more hierarchical (Fig. 3).
REGRESSION.M
This program will perform regressions via generalized least squares. You
will be asked to specify the tip data file (e.g., 49.TIP) and also the phylogenetic
variance-covariance matrix (e.g., 49LBR70.DSC). You will also be asked about
the sizes of these files (e.g., how many columns) and which columns in the
tip data file contain your dependent (Y) and independent (X) variables.
Output includes conventional (non-phylogenetic) results as well as GLS results.
The latter are identical to what can be obtained with phylogenetically independent
contrasts (verified in Ted's 49.RUN SPSS for DOS syntax file).
REGRESSIONv2.M
Like REGRESSION.M, this program performs multiple regressions via both ordinary least squares and (phylogenetic) generalized least squares. In addition, it can implement automatic transformations of the phylogenetic variance-covariance matrix (the DSC file) based on restricted maximum likelihood (REML), including under an OU model, Grafen's (1989) rho, and Pagel's lambda. For more details, see this paper:
Lavin, S. R., W. H. Karasov, A. R. Ives, K. M. Middleton, and T. Garland, Jr. 2008. Morphometrics of the avian small intestine, compared with non-flying mammals: a phylogenetic approach. Physiological and Biochemical Zoology 81:526-550. [provides Matlab Regressionv2.m, released as part of the PHYSIG package] [PDF file]
TREECONV5.LSP
(version of 6 June 2002)
Xlisp must be installed on your computer.
Depending on platform and installation, start Xlisp by clicking on "Wxls32.exe"
Two windows will open:
XLISP-STAT
Listener
Go to File Menu in XLISP-STAT window.
"Dribble" will turn on session log (diary).
"Load" will load code, e.g., "treeconv5.lsp"
To run again, type "(MAIN)" or reload Xlisp
The program does 2 things:
1. reads in a matrix (any name) and list of tip names (any name, but must be in same order as matrix), then outputs a bracket format tree. As of 16 Nov. 2002, note that the tip data name can NOT have a header row.
2. reads in bracket format tree (must have branch lengths) and outputs
a matrix and list of tip names
Notes
When attempting to convert a matrix (e.g., as output by PHYSIGOU.M or by PHYSIGACDC.M) to a tree, matrices that reflect trees with some very short branch lengths may cause problems. If so, then try increasing the Tolerance in TREECONV5.LSP. Just open the file in any text editor and you will find these lines near the top:
;;;;;;;;;; VALUE FOR BLOCK STRUCTURE TOLERANCE. YOU MAY NEED TO TWEAK THIS IF THE PROGRAM DOES NOT RUN CORRECTLY ;;;;;;;;;
(DEF *TOLERANCE* 0.00000000001)
Try increasing the Tolerance by a factor of 10 and see if the program then works. If not, try increasing it some more, etc., until it works. If this happens, make sure to check the resulting tree to confirm that it is indeed the same topology as your original tree (e.g., before transformation by the OU or ACDC model), except possibly for polytomies.
Printing Trees
You have various options. One is to use the TreeView program by Rod Page, which is available for all major operating systems and can be downloaded at:
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
TreeView will read in the bracket format trees produced by TREECONV5.LSP or by PDTREE.EXE (one of the modules in our older DOS program package PDAP).
In PDTREE, these would be saved with the .BRK extension. To read in a file
in TreeView, just click on File, Open, then All files, which will show a listing
of all files in the specified folder. Then click on your bracket format file
file.
TreeView can also write bracket format trees in the "PHYLIP/Newick 8:45" format,
which are compatible with TREECONV5.LSP and PDTREE. Make sure to check the "Include
edge lengths" box to save branch lengths! If you want to use PDTREE,
then make sure you name them with the .BRK extension or PDTREE will not understand
them.
References
Al-kahtani, M. A., C. Zuleta, E. Caviedes-Vidal, and T. Garland, Jr. 2004. Kidney mass and relative medullary thickness of rodents in relation to habitat, body size, and phylogeny. Physiological and Biochemical Zoology 77:346-365. (plus online Appendix B). [PDF file]
Ashton, K. G. 2004. Comparing phylogenetic signal in intraspecific and interspecific body size datasets. Journal of Evolutionary Biology 17:1157-1161.
Blomberg, S. P., T. Garland, Jr., and A. R. Ives. 2003. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution 57:717-745. [PDF file]
Buchwalter, D. B., D. J. Cain, C. A. Martin, L. Xie, S. N. Luoma, and T. Garland, Jr. 2008. Aquatic insect ecophysiological traits reveal phylogenetically based differences in dissolved cadmium susceptibility. Proceedings of the National Academy of Science U.S.A. 105:8321-8326. [uses Regressionv2.m] [PDF file]
Freckleton, R. P., P. H. Harvey, and M. Pagel. 2002. Phylogenetic analysis and comparative data: a test and review of evidence. American Naturalist 160:712-726.
Garland, T., Jr., and A. R. Ives. 2000. Using the past to predict the present: Confidence intervals for regression equations in phylogenetic comparative methods. American Naturalist 155:346-364. [PDF file]
Garland, T., Jr., A. F. Bennett, and E. L. Rezende. 2005. Phylogenetic approaches in comparative physiology. Journal of Experimental Biology 208:3015-3035. [PDF file]
Gartner, G. E. A., J. W. Hicks, P. R. Manzani, D. V. Andrade, A. S. Abe, T. Wang, S. M. Secor, and T. Garland, Jr. 2010. Phylogeny, ecology, and heart position in snakes. Physiological and Biochemical Zoology 83:43-54. [PDF file]
Lavin, S. R., W. H. Karasov, A. R. Ives, K. M. Middleton, and T. Garland, Jr. 2008. Morphometrics of the avian small intestine, compared with non-flying mammals: a phylogenetic approach. Physiological and Biochemical Zoology 81:526-550. [provides Matlab Regressionv2.m, released as part of the PHYSIG package] [PDF file]
Rezende, E. L., F. Bozinovic, and T. Garland, Jr. 2004. Climatic adaptation and the evolution of basal and maximum rates of metabolism in rodents. Evolution 58:1361-1374. [PDF file]
Spoor, F., T. Garland, Jr., G. Krovitz, T. M. Ryan, M. T. Silcox, and A. Walker. 2007. The primate semicircular canal system and locomotion. Proceedings of the National Academy of Science U.S.A. 104:10808-10812. [uses Regressionv2.m] [PDF file]
Swanson, D. L., and T. Garland, Jr. 2009. The evolution of high summit metabolism and cold tolerance in birds and its impact on present-day distributions. Evolution 63:184-194. [uses Regressionv2.m] [PDF file]Warne, R. W., and E. L. Charnov. 2008. Reproductive allometry and the size-number trade-off for lizards. American Naturalist 172:E80-E98. [uses Regressionv2.m]