PHYSIG: Phylogenetic Signal

Excerpt from File = PHYSIG.DOC version of 22 March 2013... a work in progress ...

Partial Documentation for:

PHYSIG Matlab Programs

Matlab code for the phylogenetic analysis of comparative data using Generalized Least Squares
and computer-intensive methods.

PHYSIG.M (latest version is 27March 2016) Accompanies:

Blomberg, S. P., T. Garland, Jr., and A. R. Ives. 2003. Testing for phylogenetic signal in comparative data:
      behavioral traits are more labile. Evolution 57:717-745.  [PDF file]

REGRESSIONv2.Mb> (latest version is 27 March 2016) accompanies:

Lavin, S. R., W. H. Karasov, A. R. Ives, K. M. Middleton, and T. Garland, Jr. 2008.
      Morphometrics of the avian small intestine, compared with non-flying mammals:
      a phylogenetic approach. Physiological and Biochemical Zoology 81:526-550.
      [PDF file]

The various modules of PHYSIG can be obtained from Ted Garland on request.
Just send me an email!

Theodore Garland, Jr.
Department of Evolution, Ecology, and Organismal Biology
University of California
Riverside, CA 92521
Phone: (951) 827-3524 = Ted's office (with answering machine)
Phone: (951) 827-5903 = Dept. office
Facsimile: (951) 827-4286 = Dept. office
E-mail: tgarland@ucr.edu
http://www.biology.ucr.edu/
http://www.biology.ucr.edu/people/faculty/Garland/Garland2.html

PDF files of my papers can be found here:
http://www.biology.ucr.edu/people/faculty/Garland/GarlandPublications.html

 

List of Computer Programs in MatLab (and 1 in Xlisp-Stat)
     (Figs. refer to Blomberg et al., 2003).

PHYSIG.DOC: Documentation as Microsoft Word file.

PHYSIG.M (latest version is 27March 2016): Tests for statistical significance of phylogenetic signal (tendency for related species to resemble each other) by randomization procedure.
Computes K statistic, which indicates amount of phylogenetic signal relative to expectation under Brownian motion character evolution along the specified phylogenetic tree (topology plus branch lengths. Also computes other MSE statistics. Allows transformation of tree prior to analyses, by user-specified value of OU or ACDC parameter.
Uses ACDC.m, ACDCDetNE1.m, mse0ratio.m, mseStarratio.m, OU.m, OUDetNE1.m, PHYSIGfunct.m, and scalebydet.m, which must be in the same folder as PHYSIG.M.

PHYSIGOU.M: Maximum likelihood estimate of OU transformation parameter (d), randomization test for whether d differs significantly from zero, MSE statistics.
Uses GetMSEOUfunct.m, mseratio.m, PHYSIGfunct.m, PHYSIGOUfunct.m, scalebydet.m, and tolerance.m.

PHYSIGACDC.M: Maximum likelihood estimate of ACDC parameter (g), randomization test for whether g differs significantly from zero, MSE statistics.
Uses GetMSEACDCfunct.m, mseratio.m, PHYSIGACDCfunct.m, PHYSIGfunct.m, scalebydet.m, and tolerance.m.

PHYOUH0d.M: Tests null hypothesis that d = user-specified value (typically 1), via a randomization test.
Uses GetMSEOUfunct.m, OU.m, PHYSIGfunct.m, PHYSIGOUfunct.m, and tolerance.m.

PHYACDCH0g.M: Tests null hypothesis that g = a user-specified value (typical 1), via a randomization test.
Uses ACDC.m,GetMSEACDCH0gfunct.m, PHYSIGACDCfunct.m, and PHYSIGfunct.m.

PHYSIGER.M: Mainly a research tool. Analyzes simulated data from PDSIMUL.EXE of the PDAP package to calculate Type I or Type II error rates of the randomization test that is implemented by PHYSIG.M.
Uses PHYSIGfunct.m

PHYSIGDG.M: Mainly a research tool. Analyzes simulated data from PDSIMUL.EXE to estimate d and g, as well as computation of MSE statistics.
Uses mseratio.m, PHYSIGACDCfunct.m, PHYSIGfunct.m, and PHYSIGOUfunct.m.

REGRESSION.M: Computes simple and multiple regressions via the method of Generalized Least Squares (GLS), e.g., as described in:
Garland, T., Jr., and A. R. Ives. 2000. Using the past to predict the present: Confidence intervals for regression
      equations in phylogenetic comparative methods. American Naturalist 155:346-364.
Rohlf, F. J. 2001. Comparative methods for the analysis of continuous variables: geometric interpretations.
      Evolution 55:2143-2160.
This program also outputs the conventional (nonphylogenetic) versions of the analyses.
Note also that if you perform these analyses on a star phylogeny then all results will be identical to what you would obtain via a conventional (nonphylogenetic) regression analysis. This is a good reality check!
Beginning in January 2008, a newer version named REGRESSIONv2.M is available. It allows simultaneous estimation of branch length transformations (transformation of the phylogenetic variance-covariance matrix) under the OU and ACDC models, as well as via Grafen's rho and Pagel's lambda. It accompanies this paper:
Lavin, S. R., W. H. Karasov, A. R. Ives, K. M. Middleton, and T. Garland, Jr. 2008. Morphometrics of the avian small intestine, compared with non-flying mammals: a phylogenetic approach. Physiological and Biochemical Zoology 81:526-550.  [PDF file]
This program reports both likelihoods and AIC values, thus facilitating a "model selection" approach.
The latest version is 27 March 2016.


TREECONV.LSP: Converts phylogenetic variance-covariance matrix to bracket format phylogenetic tree and vice versa. Particularly useful for visualizing results of OU and ACDC transformations (e.g., see Figs. 1, 8).


General Instructions

Most of the programs take as input two ASCII (plain text) files.

1. Tip data file. One contains the tip (comparative) data to be analyzed. The file can be as simple as a single column of numbers, representing data for one trait (e.g., body size). It can contain multiple columns of trait data, and it can also include tip names in the first column.

One simple way to create a tip data file is to cut and paste the bottom block from a PDI file format from PDAP, a DOS set of programs that is also available from Ted Garland.

2. Phylogenetic variance-covariance matrix. This is a square matrix. Diagonals represent the branch-length distance from root to each tip. Off-diagonals represent the branch-length distance from the root to the last common ancestor of each pair of tips.

One way to create this matrix is with the PDDIST.EXE program of PDAP:

a. Select option 5 to produce what is named a DSC matrix.

b. Choose M for matrix output.

c. Use of a header it optional (the MatLab code does not use it if it is there).

d. Do not scale all values.

e. Do not write in a compact format without exponents unless all of your branch lengths are in whole numbers or at least have few decimal places.

Crucial: the rows in the tip data file must be in the same order as the left-to-right (and top-to-bottom) order of the phylogenetic matrix!!! Otherwise, all results will be nonsense. Garbage in, garbage out. Note that PDTREE and PDDIST always saves PDI and DSC files in this form, so it is convenient to use these two programs in concert to create your tip data and phylogenetic matrices.


As example data sets, seven files are included:

49LBR70.PDI
This is the original PDTREE.EXE data file (tree and tip data) from the following paper:
Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. Phylogenetic analysis of covariance by computer simulation. Systematic Biology 42:265-292.
Trait one is log10 body mass (kg) and trait two is log10 home range area (km2), as taken from their Table 1. Branch lengths in the original tree (file 49LBR.PDI) were in years and hence very large numbers. This can cause numerical problems when inverting matrices, etc. Therefore, the present file has been rescaled in PDTREE.EXE such that branch lengths are in millions of years and the total height of the tree is thus 70 rather than 70,000,000.

49LBR_1.PDI
This is the above tree but with all branch lengths set to be unity (1.00). You can do this easily in our PDTREE.EXE program by selecting "(C)onstant" branch lengths.

49LBR70.DSC
This is the phylogenetic variance-covariance matrix produced by PDDIST.EXE (again, plain ASCII text).

49LBR_1.DSC
This is the phylogenetic variance-covariance matrix produced by PDDIST.EXE for the tree with all branch lengths set to be constant (1.00).

49STAR.DSC
This is the phylogenetic variance-covariance matrix produced by PDDIST.EXE for the tree collapsed to be star (no hierarchical structure) and with all terminal branch lengths then set to be constant (1.00).

49LBR.TIP
This is just the bottom data block cut out from 49LBR.PDI and saved as plain ASCII text. The first column is the species designations as 2-character codes; the second column is log10 body mass (kg); the third column is log10 home range area (km2).

49.TIP
This is like 49LBR.TIP except that it also contains:
log10 hind leg length in the 4th column;
log10 front leg length in the 5th column;
metatarsal/femur ratio in the 6th column
metacarpal/humerus ratio in the 7th column;
log10 group size in the 8th column;
log10 maximal sprint speed in the 9th column.

These other traits, except for group size, are from this paper:
Garland, T., Jr., and C. M. Janis. 1993. Does metatarsal/femur ratio predict maximal running speed in cursorial mammals? J. Zool., Lond. 229:133-151.


The DSC and TIP files can be fed directly into the various PHYSIG modules (including REGRESSION.M). Results (e.g., the K statistic) can be compared with what is shown in the Appendix of Blomberg et al. (2003).


PHYSIG.M

    Start with this program. It will perform the randomization test for phylogenetic signal and also calculate the K statistic to gauge how much phylogenetic signal is present.

After this, try PHYSIGOU.M and PHYSIGACDC.M.

PHYSIG_LL.M:      

    Newer version of PHYSIG.M (August 2005) that includes calculation of ln likelihood for how well the tip data are fit by the specified phylogeny versus a star phylogeny.   


PHYSIGER.M

    PHYSIGER.M takes as input from the user a single .DSC matrix (from PDDIST.EXE or other suitable program), which indicates the expected variance-covariance matrix of the tip data given the specified phylogeny, and a .SIM file from program PDSIMUL.EXE of the PDAP package, which will include multiple sets of data simulated along the specified phylogeny. Depending on the .SIM file and .DSC matrix, PHYSIGER.M can be used to calculate the Type I error rate or the power of the randomization test for phylogenetic signal (as implemented in PHYSIG.M).

Type I error rate:

    Type I error rate is the probability of rejecting the null hypothesis when the null hypothesis is actually true. To create data with no phylogenetic signal, one can simulate data along a star phylogeny. These data can then be analyzed with PHYSIGER.M, but with a .DSC matrix that corresponds to a hierarchical phylogeny with the same number of tips. Given that the data possess no phylogenetic signal, we should not detect signal, irrespective of what sort of tree we use to analyze the data. Occasionally, however, we will detect signal, but this should occur only about 5% of the time if Type I error rate is putatively 0.05.
    PHYSIGER.M will analyze each of the simulated data sets by the same permutation test as in PHYSIG.M. (One may wish to do only 100 permutations for each of the simulated data sets, in order to save time.) PHYSIGER.M will output the proportion of simulated data sets for which 5% or fewer of the permuted MSEs are < the original MSE; that is, the proportion of simulated data sets for which we seem to have detected signal, even though none should exist on average because they were created by simulation along a star phylogeny.

Power:

    Inputting data that have been simulated along a hierarchical phylogeny allows one to determine the power of the permutation test implemented in PHYSIG.M. The statistical power of a test is the probability of rejecting the null hypothesis when it is in fact false (power = 1 – Type II error rate). The null hypothesis is no phylogenetic signal. Therefore, we can create data for which the null hypothesis is false by simulating data along a phylogeny that is hierarchical. The more hierarchical the tree, the more signal will be present (on average). For a given shape tree, the larger the tree the more signal will tend to be present (see Blomberg et al., 2003). Thus, the larger the tree of a given shape, the greater the power to detect signal.
    The number of simulated data sets, and the number of permutations for each test are supplied by the user. Each simulated data set is treated as an "original" or "true" data set. PHYSIGER.M calculates the MSE for that data set and tree (the same tree [matrix] is used throughout). Then the data are permuted on the tree a large number of times (usually 1,000) and an MSE is calculated for each permutation. If, for any one simulation, the original MSE is less than 95% of the permuted MSEs (i.e., original MSE is significantly less than the permuted MSEs), then that fact is tallied. The program outputs the proportion of simulated data sets which showed significant results. A one-tailed test is given for the proportion of simulations which were significant at the 5% level. A two-tailed test is given by recording the number of simulations for which the original MSE was less than 2.5% of the permuted MSEs or greater than 97.5% of the permuted MSEs.
    The output for PHYSIGER.M consists of a histogram of the distribution of permuted MSEs for each simulated data set (they flash on the screen one after the other), and a statement as to the number of MSEs which were less than the original for each simulation. At the end of the run, the proportion of simulated data sets for which the original MSE was less than 5% of the permuted MSEs for each simulation is given. Similarly, for the two-tailed test, the proportion of simulated data sets where the MSE was less than 2.5% of the permuted MSEs or greater than 97.5% of the permuted MSEs is given, and the two-tailed error rate is simply the sum of these last two values.

What we did in Blomberg et al. (2003):

    We measured Type I error rate using a 32-tip tree. We simulated data along S32star.pdi, then analyzed those data with three hierarchical trees, S32.DSC, C32.DSC, C32ONE.DSC. We used PHYSIGER.M to calculate the power of the permutation test for different sample sizes (8, 16, 32, etc.) and different tree topologies (symmetrical, ladder-shaped, ladder-shaped with all branch lengths = 1). We also examined the effect of the OU parameter (effect of "starness") on power by transforming 49lbr.inp (from Garland et al., 1993) using different values of d and measuring the power for each tree. Power increases rapidly with sample size and symmetrical trees yield slightly higher power (Fig. 2). Power also increases as "starness" decreases and the tree becomes more hierarchical (Fig. 3).
 


REGRESSION.M

          This program will perform regressions via generalized least squares. You will be asked to specify the tip data file (e.g., 49.TIP) and also the phylogenetic variance-covariance matrix (e.g., 49LBR70.DSC). You will also be asked about the sizes of these files (e.g., how many columns) and which columns in the tip data file contain your dependent (Y) and independent (X) variables.
          Output includes conventional (non-phylogenetic) results as well as GLS results. The latter are identical to what can be obtained with phylogenetically independent contrasts (verified in Ted's 49.RUN SPSS for DOS syntax file).


REGRESSIONv2.M

           Like REGRESSION.M, this program performs multiple regressions via both ordinary least squares and (phylogenetic) generalized least squares. In addition, it can implement automatic transformations of the phylogenetic variance-covariance matrix (the DSC file) based on restricted maximum likelihood (REML), including under an OU model, Grafen's (1989) rho, and Pagel's lambda. For more details, see this paper:

Lavin, S. R., W. H. Karasov, A. R. Ives, K. M. Middleton, and T. Garland, Jr. 2008. Morphometrics of the avian small intestine, compared with non-flying mammals: a phylogenetic approach. Physiological and Biochemical Zoology 81:526-550. [provides Matlab Regressionv2.m, released as part of the PHYSIG package] [PDF file]


TREECONV5.LSP
(version of 6 June 2002)

Xlisp must be installed on your computer.

Depending on platform and installation, start Xlisp by clicking on "Wxls32.exe"

Two windows will open:

XLISP-STAT

Listener

Go to File Menu in XLISP-STAT window.

"Dribble" will turn on session log (diary).

"Load" will load code, e.g., "treeconv5.lsp"

To run again, type "(MAIN)" or reload Xlisp

The program does 2 things:

1. reads in a matrix (any name) and list of tip names (any name, but must be in same order as matrix), then outputs a bracket format tree. As of 16 Nov. 2002, note that the tip data name can NOT have a header row.

2. reads in bracket format tree (must have branch lengths) and outputs a matrix and list of tip names
 
Notes

When attempting to convert a matrix (e.g., as output by PHYSIGOU.M or by PHYSIGACDC.M) to a tree, matrices that reflect trees with some very short branch lengths may cause problems. If so, then try increasing the Tolerance in TREECONV5.LSP. Just open the file in any text editor and you will find these lines near the top:

;;;;;;;;;; VALUE FOR BLOCK STRUCTURE TOLERANCE. YOU MAY NEED TO TWEAK THIS IF THE PROGRAM DOES NOT RUN CORRECTLY ;;;;;;;;;

(DEF *TOLERANCE* 0.00000000001)

Try increasing the Tolerance by a factor of 10 and see if the program then works. If not, try increasing it some more, etc., until it works. If this happens, make sure to check the resulting tree to confirm that it is indeed the same topology as your original tree (e.g., before transformation by the OU or ACDC model), except possibly for polytomies.

 


Printing Trees

          You have various options. One is to use the TreeView program by Rod Page, which is available for all major operating systems and can be downloaded at:

http://taxonomy.zoology.gla.ac.uk/rod/treeview.html

          TreeView will read in the bracket format trees produced by TREECONV5.LSP or by PDTREE.EXE (one of the modules in our older DOS program package PDAP). In PDTREE, these would be saved with the .BRK extension. To read in a file in TreeView, just click on File, Open, then All files, which will show a listing of all files in the specified folder. Then click on your bracket format file file.
          TreeView can also write bracket format trees in the "PHYLIP/Newick 8:45" format, which are compatible with TREECONV5.LSP and PDTREE. Make sure to check the "Include edge lengths" box to save branch lengths! If you want to use PDTREE, then make sure you name them with the .BRK extension or PDTREE will not understand them.


References

Al-kahtani, M. A., C. Zuleta, E. Caviedes-Vidal, and T. Garland, Jr. 2004. Kidney mass and relative medullary thickness of rodents in relation to habitat, body size, and phylogeny. Physiological and Biochemical Zoology 77:346-365. (plus online Appendix B). [PDF file]

Ashton, K. G. 2004. Comparing phylogenetic signal in intraspecific and interspecific body size datasets. Journal of Evolutionary Biology 17:1157-1161.

Blomberg, S. P., T. Garland, Jr., and A. R. Ives. 2003. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution 57:717-745. [PDF file]

Buchwalter, D. B., D. J. Cain, C. A. Martin, L. Xie, S. N. Luoma, and T. Garland, Jr. 2008. Aquatic insect ecophysiological traits reveal phylogenetically based differences in dissolved cadmium susceptibility. Proceedings of the National Academy of Science U.S.A. 105:8321-8326. [uses Regressionv2.m] [PDF file]

Freckleton, R. P., P. H. Harvey, and M. Pagel. 2002. Phylogenetic analysis and comparative data: a test and review of evidence. American Naturalist 160:712-726.

Garland, T., Jr., and A. R. Ives. 2000. Using the past to predict the present: Confidence intervals for regression equations in phylogenetic comparative methods. American Naturalist 155:346-364. [PDF file]

Garland, T., Jr., A. F. Bennett, and E. L. Rezende. 2005. Phylogenetic approaches in comparative physiology. Journal of Experimental Biology 208:3015-3035. [PDF file]

Gartner, G. E. A., J. W. Hicks, P. R. Manzani, D. V. Andrade, A. S. Abe, T. Wang, S. M. Secor, and T. Garland, Jr. 2010. Phylogeny, ecology, and heart position in snakes. Physiological and Biochemical Zoology 83:43-54. [PDF file]

Lavin, S. R., W. H. Karasov, A. R. Ives, K. M. Middleton, and T. Garland, Jr. 2008. Morphometrics of the avian small intestine, compared with non-flying mammals: a phylogenetic approach. Physiological and Biochemical Zoology 81:526-550. [provides Matlab Regressionv2.m, released as part of the PHYSIG package] [PDF file]

Rezende, E. L., F. Bozinovic, and T. Garland, Jr. 2004. Climatic adaptation and the evolution of basal and maximum rates of metabolism in rodents. Evolution 58:1361-1374. [PDF file]

Spoor, F., T. Garland, Jr., G. Krovitz, T. M. Ryan, M. T. Silcox, and A. Walker. 2007. The primate semicircular canal system and locomotion. Proceedings of the National Academy of Science U.S.A. 104:10808-10812. [uses Regressionv2.m] [PDF file]

Swanson, D. L., and T. Garland, Jr. 2009. The evolution of high summit metabolism and cold tolerance in birds and its impact on present-day distributions. Evolution 63:184-194. [uses Regressionv2.m] [PDF file]

Warne, R. W., and E. L. Charnov. 2008. Reproductive allometry and the size-number trade-off for lizards. American Naturalist 172:E80-E98. [uses Regressionv2.m]