PhyloBuilder is designed to assist biologists in phylogenomic analysis of a protein, starting with the identification of family members (homologs), and proceeding to multiple sequence alignment and phylogenetic tree construction. The PhyloBuilder "pipeline" combines many of Berkeley Phylogenomics Group's tools to create a protein family based on your own protein of interest. These tools include the FlowerPower algorithm (see below), used to identify protein family members; and the SCI-PHY program (see below), used to identify groups of proteins ("subfamilies") within your protein family that have close evolutionary and functional relationships. More information.
FlowerPower is a protein homology clustering algorithm, similar to PSI-BLAST in its iterated approach to alignment, profile construction, and homolog identification, but with distinct differences. FlowerPower includes phylogenetic tree construction, subfamily identification, and subfamily HMM construction in the clustering and alignment process. This enables FlowerPower to avoid some of the common pitfalls of protein clustering methods (particularly profile drift). FlowerPower can also be parameterized for use in phylogenomic analysis for protein functional classification, where global-global alignment of all proteins in the set is required for accuracy of molecular function inference.
SCI-PHY is Berkelely Phylogenomics Group's "Subfamily Classification In PHYlogenomics" program. Given a multiple sequence alignment, SCI-PHY uses a minimum-encoding-cost criterion to create subfamilies intended to represent proteins with close evolutionary and functional relationships. Outputs include subfamily definitions and hidden Markov models (HMMs) in both HMMER and UCSC SAM formats.
SATCHMO is Berkeley Phylogenomics Group's "Simultaneous Alignment and Tree Construction using Hidden Markov mOdels" program. SATCHMO simultaneously constructs a tree and a set of multiple sequence alignments, one for each internal node of the tree. The alignment at a given node contains all sequences within its sub-tree, and predicts which positions in those sequences are alignable and which are not.