bootscore v3.11

by Jeet Sukumaran

What's New

In Version 3.11:

In Version 3.0:

In Version 2.0:

Introduction

"bootscore" is a program that maps non-parametric bootstrap support for nodes/branches on topology (typically a topology resulting from a best tree search on phylogenetic data). The latest version works in one of two modes. In its default bipartition-counting mode, it identifies all distinct bipartitions in the tree to be evaluated, and then scans through a file of bootstrap replicates to identify the percentage or proportional frequency of occurance of each of those bipartitions in each of the bootstrap trees. It can also operate in a clade-counting mode, in which it identifies all distinct monophyletic groups in the tree being assessed, and then counts the number of bootstrap trees in which that particular monophyletic group is recovered. The program outputs a NEXUS treefile with the topology of the given tree and with the bipartition/clade support indicated via node labels or branch lengths (in the former case, the branch lengths of the original tree is retained).

It currently does not take into account tree weight comments in assessing support, though this feature will probably be added in the next version. And no, I have not yet written a GUI for it, and probably will not get around to writing one for it anytime soon.

How to Get the Program

The latest public release of this program can be downloaded from:

http://sourceforge.net/projects/bootscore/

How to Set Up the Program

Install Python If You Do Not Already Have It

The program is a Python script, and, as such, you will need to have a Python interpreter installed on your system. If you are using a Linux, Unix or Macintosh (OS X) operating system, you should already have a Python interpreter ready to go, and you can skip this step. Otherwise, you must download and install Python from: http://www.python.org. Microsoft Windows users should also refer to the Python Windows FAQ (http://www.python.org/doc/faq/windows.html) after installing Python, and pay particular attention to the "How do I run a Python program under Windows?" section, as it will help them greatly in getting Python up and running on the system path.

I have developed and tested it using Python version 2.4. The previous versions of the program did not work under Python 2.3, but this one should.

Extract the Script File

This step varies depending on the operating system and the particular programs that you have installed. In most cases, simply double-clicking on the file that you have downloaded should kick off the process. Otherwise, open up a terminal window, navigate to the directory in which you have downloaded the package, and then type:

tar xvf bootscore-3.11.tar.gz
This should create a folder called "bootscore-1.0" with the main program script file, "bootscore.py", and supporting files.

Install the Script as an Executable on the System Path

You will make life easier for yourself by making the script executable and placing it on the system path. On Linux and Macintosh systems, the following command will do the trick, assuming that you have administrator privileges:

sudo cp bootscore.py /usr/bin

You will be prompted for your password, after which a copy of the script file will be placed on the system path, meaning that you will be able to invoke the program from any location on your computer without needing a local copy in your current folder.

Microsoft Windows users should refer to "How do I make python scripts executable?" in the Python Windows FAQ for details on how to achieve the same ends.

Data Required by the Program

You will need to provide:

  1. A tree file specifying the topology (or topologies) that you want to evaluate, in Newick or NEXUS format.
  2. A file of bootstrap trees, also in Newick or NEXUS format.

Typically, (1) will be the result of your best tree search or searches, while (2) will be the result of your non-parametric bootstrap runs. You can have more than one tree specified in the treefile given in (1); "bootscore" will score each tree independentally, but save them all in a single tree block in the same output file. "bootscore" can handle TRANSLATE blocks without a problem, but apart from that, the taxon labels must be identical down to the last character and in case for the same taxa across all tree statements for the analysis to be valid.

In version 2.0, the NEXUS file-parsing engine was rewritten from scratch, and it can now handle any NEXUS or Newick compliant files, including those containing comment blocks, special characters in identifier names, etc.

Program Usage Description and Examples

The current version of the program is a command-line utility that is used as follows:

bootscore.py [options] -t <treefile> -b <bootstrapfile> -o <outputfile>

The above assumes that you have set up bootscore to an executable on the system path. If not, you will need to pass the script filename to the python interpreter:

python bootscore.py [options] -t <treefile> -b <bootstrapfile> -o <outputfile>

(Assuming that the "bootscore.py" script file is located in the current folder as well.)

You can also use the long version of the options or parameters, which involve more typing, but are less cryptic and easier to remember. The long versions of the options are preceded by two dashes instead of one, and are followed by an equals sign before the option value (if any) is specified:

bootscore.py [options] --tree=<treefile> --bootstraps=<bootstrapfile> --output=<outputfile>

Or, if you have not installed the bootscore script as an executable on the system path, but have placed it in the current folder:

python bootscore.py [options] --tree=<treefile> --bootstraps=<bootstrapfile> --output=<outputfile>

By default, bootscore outputs a treefile where the topology and branch lengths correspond to that given in the original treefile, and with bipartition or clade support in terms of percentages given by internal node labels. So for example, assuming that you have a a best estimated tree topology file given by "hyla16s.best.tre", and a set of non-parametric bootstrap trees given by "hyla16s.bs100.tre", and copies of both files are sitting directly in the current folder, then the following command will create a a treefile "hyla16s.bestbs.tre" in the same folder, with bootstrap support for each clade (i.e., the percentage of bootstrap trees in which that in that clade was found) indicated by node labels:
bootscore.py -t hyla16s.best.tre -b hyla16s.bs100.tre -o hyla16s.bestbs.tre

Of course, the data files do not have to reside in the same folder, as long as you provide the full path (relative or absolute) from the current directory for each of the files:

bootscore.py -t /home/jeet/data/hyla/hsearch/hyla16s.best.tre -b /home/jeet/data/hyla/boots/hyla16s.bs100.tre -o /home/jeet/data/hyla/support/hyla16s.bestbs.tre

If you want the support values indicated by proportional frequencies instead of percentages, use the proportion option, "-p", or "--proportions":

bootscore.py -t hyla16s.best.tre -b hyla16s.bs100.tre -o hyla16s.bestbs.tre -p
bootscore.py --tree=hyla16s.best.tre --bootstraps=hyla16s.bs100.tre --output=hyla16s.bestbs.tre --proportions

If you want the support values indicated by proportional frequencies instead of percentages, use the branch length option, "-v", or "--support-as-lengths":

bootscore.py -t hyla16s.best.tre -b hyla16s.bs100.tre -o hyla16s.bestbs.tre -v
bootscore.py --tree=hyla16s.best.tre --bootstraps=hyla16s.bs100.tre --output=hyla16s.bestbs.tre --support-as-lengths

Or, combining the parameters:

bootscore.py -t hyla16s.best.tre -b hyla16s.bs100.tre -o hyla16s.bestbs.tre -v -p
bootscore.py --tree=hyla16s.best.tre --bootstraps=hyla16s.bs100.tre --output=hyla16s.bestbs.tre --support-as-lengths --proportions

Other Options and Settings

As noted, the current version of the program counts support in terms of bipartitions or splits shared between the tree being assessed and the bootstrap trees, while the previous versions counted clades. If you wish to assess support in terms of clades, then you can revert to the previous metric by using the "--clade-support-mode" option.

Other option settings allow you to specify the decimal places of precision with which to report support values ("-d" or "--decimals"), disable the inclusion of taxa blocks in the results tree file ("--no-taxa-block"), save the results as Phylip format file rather than NEXUS ("--phylip"), automatically overwrite the output file if it already exists ("-r" or "--replace"), or run without progress messages ("-q" or "--quiet"). Finally, invoking the help option ("--help") provides a summary of all the options and parameters.

How the Program Works

The first version of the program (1.0) was essentially a lexical processor: trees were maintained and manipulated as simple strings (the tree statement). This was slow, wieldy, error-prone and relatively inflexible. Subsequents versions of the program, however, employ a full-fledged n-ary tree data model, which not only makes processing much faster, but also makes programming and debugging a lot easier.

The third version of the program changed how support was assessed, to bring it in line with PAUP* scoring model. Currently, the set of bootstrap trees are examined to see which proportion of them contain internal nodes that recover the same bipartitions as internal nodes on the tree to be assessed. A "bipartition" is defined by the two groups of terminals formed if the (unrooted) tree is bisected or split at the given node. This procedure yields the exact same support values as the PAUP* model. The previous versions of the program employed a different metric, counting clades (i.e. monophyletic groups defined by each node) rather than bipartitions (splits).

Limitations

While I do not consider it a limitation in any way, some people might find it discouraging that "bootscore" is a command-line program rather than one with a graphical interface. Unfortunately, I find GUI programming tedious and time-consuming, taking several times longer to code and debug than the part that does the actual work, as well as being less interesting intellectually. As such, while I am entertaining the idea of eventually writing a graphical front-end for this program, I do not see this happening any time soon. Also, the program is rather slow. Some of is probably due to the inherent slowness of an interpreted language like Python, though I have no doubt that the code itself could use a healthy dose of refactoring and optimization.

Bugs, Suggestions, Comments, etc.

If you encounter any problems, errors, crashes etc. while using this program, please let me know at jeetsukumaran@frogweb.org. If you include the term "bootscore" anywhere on the subject line (e.g. "Problem such-and-such with bootscore"), it would help greatly with getting through the spam filter. Please include all the datafiles involved, as well the complete command used (with all the options and parameters) and the complete error message returned (simply cutting-and-pasting the terminal text should work fine). Please feel free to contact me if you have any other questions, suggestions or comments as well.

How to Cite this Program

If you use this program in your analysis, please cite it as:

Sukumaran, J. 2007. bootscore: A Bootstrap Tree Scoring Utility. Version 3.11. http://sourceforge.net/projects/bootscore.

License and Warranty

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details (see http://www.gnu.org/copyleft/gpl.html).