PHYLIP

Phylip has three programs FITCH, KITSCH, and NEIGHBOR for dealing with data which comes in the form of a matrix of pairwise distances between all pairs of taxa, such as distances based on molecular sequence data, gene frequency genetic distances, amounts of DNA hybridization, or immunological distances.

FITCH uses Fitch-Margoliash, Least Squares, and other distance measures.  Includes global search option, triples rune time.  Global is only option in KITCH and not available in NEIGHBOR.  Fitch is a slower algorithm. 

KITCH:  This program carries out the Fitch-Margoliash and Least Squares methods, plus a variety of others of the same family, with the assumption that all tip species are contemporaneous, and that there is an evolutionary clock (in effect, a molecular clock). This means that branches of the tree cannot be of arbitrary length, but are constrained so that the total length from the root of the tree to any species is the same. The quantity minimized is the same weighted sum of squares described in the Distance Matrix Methods documentation file.

NEIGHBOR:  This program implements the Neighbor-Joining method of Saitou and Nei (1987) and the UPGMA method of clustering. The program was written by Mary Kuhner and Jon Yamato, using some code from program FITCH. An important part of the code was translated from FORTRAN code from the neighbor-joining program written by Naruya Saitou and by Li Jin, and is used with the kind permission of Drs. Saitou and Jin.

NEIGHBOR constructs a tree by successive clustering of lineages, setting branch lengths as the lineages join. The tree is not rearranged thereafter. The tree does not assume an evolutionary clock, so that it is in effect an unrooted tree. It should be somewhat similar to the tree obtained by FITCH. The program cannot evaluate a User tree, nor can it prevent branch lengths from becoming negative. However the algorithm is far faster than FITCH or KITSCH. This will make it particularly effective in their place for large studies or for bootstrap or jackknife resampling studies which require runs on multiple data sets.

 

File Setup and Output File Saving:

Rr

 

Tree Drawing: 

GENEDIST

Double click on the GENEDIST icon

Enter file name and extension, bear.inf

 

Genetic Distance Matrix program, version 3.6b
Settings for this run:
  A   Input file contains all alleles at each locus?  One omitted at each locus
  N                        Use Nei genetic distance?  Yes
  C                Use Cavalli-Sforza chord measure?  No
  R                   Use Reynolds genetic distance?  No
  L                         Form of distance matrix?  Square
  M                      Analyze multiple data sets?  No
  0              Terminal type (IBM PC, ANSI, none)?  (none)
  1            Print indications of progress of run?  Yes
 
  Y to accept these or type the letter for one to change


Type “A” and enter to include all alleles

Type “Y” and enter to run GENEDIST, this will create an outfile.

Rename the outfile as outfile1

NEIGHBOR
Double click on the NEIGHBOR icon


Enter outfile name, outfile1 

Settings for this run:
  N       Neighbor-joining or UPGMA tree?  Neighbor-joining
  O                        Outgroup root?  No, use as outgroup species  1
  L         Lower-triangular data matrix?  No
  R         Upper-triangular data matrix?  No
  S                        Subreplicates?  No
  J     Randomize input order of species?  No. Use input order
  M           Analyze multiple data sets?  No
  0   Terminal type (IBM PC, ANSI, none)?  (none)
  1    Print out the data at start of run  No
  2  Print indications of progress of run  Yes
  3                        Print out tree  Yes
  4       Write out trees onto tree file?  Yes
 

  Y to accept these or type the letter for one to change
 

Use the default settings and type “Y” and enter to run NEIGHBOR, this will create another outfile and an outtree


Rename outtree as outtree1

DRAWTREE



Double click on DRAWTREE
Enter tree file name, outtree


Enter font file (font1 – font6), font3

Rooted tree plotting program version 3.6b
 
Here are the settings:
 0  Screen type (IBM PC, ANSI):  (none)
 P       Final plotting device:  Postscript printer
 V           Previewing device:  Tektronix graphics screen
 H                  Tree grows:  Horizontally
 S                  Tree style:  Phenogram
 B          Use branch lengths:  (no branch lengths available)
 L             Angle of labels:  90.0
 R      Scale of branch length:  Automatically rescaled
 D       Depth/Breadth of tree:  0.53
 T      Stem-length/tree-depth:  0.05
 C    Character ht / tip space:  0.3333
 A             Ancestral nodes:  Centered
 F                        Font:  Times-Roman
 M          Horizontal margins:  1.65 cm
 M            Vertical margins:  2.16 cm
 #              Pages per tree:  one page per tree
 
 Y to accept these or type the letter for one to change
 
Type “P” and enter to choose a final plotting device

From the menu choose “W” and enter (this will allow the final tree to imported as a picture file into PowerPoint).  Set the resolution to 500 x 500.
Type “L” and enter

From the options choose “R” and enter (this orientation allows the tree to be read more easily)


Type “Y” and enter to see a preview of the tree


If this is the tree you would like to plot click on “File” and then on “Plot” (if not click on “File” and then on “Change Parameters”)

DRAWTREE will generate a plotfile


The plotfile can now be imported into PowerPoint as a picture



Assessing Confidence:

 

SEQBOOT

Double click on the SEQBOOT icon

Enter file name and extension, bear.inf

 

Bootstrapping algorithm, version 3.6b
 

Settings for this run:
  D      Sequence, Morph, Rest., Gene Freqs?  Molecular sequences
  J  Bootstrap, Jackknife, Permute, Rewrite?  Bootstrap
  %    Regular or altered sampling fraction?  regular
  B      Block size for block-bootstrapping?  1 (regular bootstrap)
  R                     How many replicates?  100
  W              Read weights of characters?  No
  C                Read categories of sites?  No
  S     Write out data sets or just weights?  Data sets
  I             Input sequences interleaved?  Yes
  0      Terminal type (IBM PC, ANSI, none)?  (none)
  1       Print out the data at start of run  No
  2     Print indications of progress of run  Yes
 
  Y to accept these or type the letter for one to change

 

Type “D” and enter three times to change the data type to Gene Freqs

Type “A” and enter to include all alleles

Type “R” and enter to change the number of replicates, enter 1000 and enter

Type “Y”

The program will ask for a random number seed

Enter 9 and enter to run SEQBOOT, the program will generate an outfile (the “random” number must be an odd number between 1 and 32766 and of the form 4n+1)

Rename the outfile outfile3

 

GENEDIST

Double click on the GENEDIST icon

Enter file name, outfile3

 

Genetic Distance Matrix program, version 3.6b
 

Settings for this run:
  A   Input file contains all alleles at each locus?  One omitted at each locus
  N                        Use Nei genetic distance?  Yes
  C                Use Cavalli-Sforza chord measure?  No
  R                   Use Reynolds genetic distance?  No
  L                         Form of distance matrix?  Square
  M                      Analyze multiple data sets?  No
  0              Terminal type (IBM PC, ANSI, none)?  (none)
  1            Print indications of progress of run?  Yes
 
  Y to accept these or type the letter for one to change

 

Type “A” and enter to include all alleles
Type “M” and enter to indicate multiple data sets, enter 1000

Type “Y” and enter to run GENEDIST, this will create an outfile.

Rename the outfile as outfile4

 

NEIGHBOR


Double click on the NEIGHBOR icon


Enter outfile name, outfile4

 
Settings for this run:
  N       Neighbor-joining or UPGMA tree?  Neighbor-joining
  O                        Outgroup root?  No, use as outgroup species  1
  L         Lower-triangular data matrix?  No
  R         Upper-triangular data matrix?  No
  S                        Subreplicates?  No
  J     Randomize input order of species?  No. Use input order
  M           Analyze multiple data sets?  No
  0   Terminal type (IBM PC, ANSI, none)?  (none)
  1    Print out the data at start of run  No
  2  Print indications of progress of run  Yes
  3                        Print out tree  Yes
  4       Write out trees onto tree file?  Yes
 

  Y to accept these or type the letter for one to change

 

Type “M” and enter to indicate multiple data sets, enter 1000, the program will ask for a random number seed,enter a random number (odd) and enter



Type “Y”, to run NEIGHBOR, this will create an outfile and an outtree

Rename the outfile as outfile5 and the outtree as outtree2

 

CONSENSE

Double click on the CONSENSE icon

Enter treefile name, “CInput”

 

Consensus tree program, version 3.6b
 
Settings for this run:
 C         Consensus type (MRe, strict, MR, Ml):  Majority rule (extended)
 O                                Outgroup root:  No, use as outgroup species
1
 R                Trees to be treated as Rooted:  No
 T           Terminal type (IBM PC, ANSI, none):  (none)
 1                Print out the sets of species:  Yes
 2         Print indications of progress of run:  Yes
 3                               Print out tree:  Yes
 4               Write out trees onto tree file:  Yes
 
Are these settings correct? (type Y or the letter for one to change)

 

Use the default settings and type “Y” and enter to run CONSENSE, this will create an outfile and an outtree

The outfile will have a rough tree with bootstrap values, the outtree will have the input file to draw the tree in DRAWGRAM or TREEVIEW(bootstrap values must be added by hand in PowerPoint).  Open both of these files with Notepad to view the results.

 

DRAWGRAM

Double click on the DRAWGRAM icon

Enter the treefile name, outtree

Enter font file (font1 – font6), font3


Rooted tree plotting program version 3.6b
 
Here are the settings:
 0  Screen type (IBM PC, ANSI):  (none)
 P       Final plotting device:  Postscript printer
 V           Previewing device:  Tektronix graphics screen
 H                  Tree grows:  Horizontally
 S                  Tree style:  Phenogram
 B          Use branch lengths:  (no branch lengths available)
 L             Angle of labels:  90.0
 R      Scale of branch length:  Automatically rescaled
 D       Depth/Breadth of tree:  0.53
 T      Stem-length/tree-depth:  0.05
 C    Character ht / tip space:  0.3333
 A             Ancestral nodes:  Centered
 F                        Font:  Times-Roman
 M          Horizontal margins:  1.65 cm
 M            Vertical margins:  2.16 cm
 #              Pages per tree:  one page per tree
 

 Y to accept these or type the letter for one to change

 

Type “P” and enter to choose a final plotting device


From the menu choose “W” and enter (this will allow the final tree to imported as a picture file into PowerPoint).  Set the resolution to 500 x 500.


Type “Y” and enter to see a preview of the tree


If this is the tree you would like to plot click on “File” and then on “Plot” (if not click on “File” and then on “Change Parameters”)


DRAWGRAM will generate a plotfile


The plotfile can now be imported into PowerPoint as a picture


  

TreeView

 

TreeView allows you to view tree-files made in Phylip and other programs.  TreeView is superior to the drawing modules in Phylip in its ability to allow the user to easily view the data as different types of trees.

 

Open the program TreeView

 

Click on “File” and then open.

Select the outtree created by CONSENSE (should be in the exe folder in the Phylip directory)

The program defaults to a slanted cladogram.  This tree preserves the groupings but not the relative amount of genetic distance between populations

 

Click on “Tree”

From the menu select “Phylogram”

The resulting tree illustrates the groupings of the populations from CONSENSE and indicates the amount of genetic distance between populations (branch length is proportional to genetic distance)

 

Now select “Radial” from the “Tree” menu

This is another way to view both the clustering of the populations as well as the genetic distance between populations.  One advantage to this style of tree is that is appears obviously unrooted (the others are as well but is less obvious)

 

Now from the “Tree” menu select “Rectangular Cladogram”

This tree preserves the groupings but not the relative amount of genetic distance between populations

 

As an exercise in interpreting trees in light of biological data use your output file as well as the tree files titled “example1”, “example2”, and “example3” in the class folder.  Compare the different tree structures to the range maps found on the PowerPoint file titled “Bear Range” to suggest hypotheses about how the bears may have come to occupy there current range in California.  In particular, think about the relationship of SAND, SANB, and SLO to the rest of the populations.

___________________________________________________________________________________

© Copyright 1986-2004 by the University of Washington. Written by Joseph Felsenstein. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed.