[phylocom-user] phylomatic: error trying to match large taxa file against R20100701 megatree.

Cam Webb cwebb at oeb.harvard.edu
Tue Sep 6 03:44:24 PDT 2011


Dear Gustavo,

This is now a common error, with people making larger and larger trees. To 
solve this, Steve and I need to redesign the memory allocation in the 
recursive function Fy2newRec().  If anyone is a C programmer, we would 
really appreciate your help with fixing this.

For the moment, here is a hack to enable you to make very large trees with 
phylomatic (requiring some basic unix skills):

1) Edit the code of phylomatic.c (at line 249 in version 4.2), deleting or 
commenting out:

   Fy2newRec(Prune(P, keep));

and replacing with:

   for (i = 0 ; i < P.nnodes; i++)
     {
       printf("%d\t%d\t%d\t\t%d\t%f\t%s\n", \
         i, P.up[i], P.noat[i], P.depth[i], P.bl[i], P.taxon[i]);
     }

2) Recompile, with

   $ make

3) Run as usual, saving the output to a file. Give it a suffix of `.fy'. 
Fy-format is a simple tabular description of a tree; see below for the 
columns.

   $ ./phylomatic -t taxa -f R20100701.new -l > bigtree.fy

4) Save the following AWK script as `fy2new' and make it executable with

   $ chmod u+x fy2new

5) Run this script on your fy-formatted output:

   $ fy2new bigtree.fy > bigtree.new

This should give you a large newick tree.

Sorry you're having this problem.  We'll be working on this memory 
limitation for the next version of phylocom/matic.

Best,

Cam


----------------- AWK script, fy2new ------------------------------

#!/bin/gawk -f

# fy2new
# converts a tabular `fy format' phylogeny to Newick, parenthetical format

# (c) 2011 Cam Webb
# Distributed under (open source) BSD 3-clause licence

# fy-format = tab-delimited fields
#   1. nodeID (starting at 0 for root node)
#   2. parent node nodeID
#   3. number of daughter nodes
#   4. comma-delimited list of daughter nodeIDs (optional)
#   5. depth of node (number of edges from root)
#   6. branch length to parent node (a float)
#   7. node name (terminal or interior node)

BEGIN{

   if(ARGC != 2)
     {
       print "  Usage: fy2new file.fy";
       exit;
     }

   FS="\t";
   NoBl = 1;
}

{
   NodeNo = $1;
   up[NodeNo] = $2;
   noat[NodeNo] = $3;
   bl[NodeNo] = $6;
   taxon[NodeNo] = $7;

   if ($6 != "") { NoBl = 0 };
   if (nnodes < NodeNo) {nnodes = NodeNo};
}

END{

   if(ARGC != 2) exit ;

   nnodes = nnodes+1;

   # SetNodePointers

   for (x = 0; x < nnodes; x++)
     {
       ldown[x] = -99;
       rsister[x] = -99;
       first[x] = 1;
     }

   for (x = 0; x < nnodes; x++)
     {
       # starting at terms
       if (noat[x] == 0)
         {
           y = x;
           # while not yet at the root
           while (y != 0)
             {
               # is this the first daughter in new structure?
                 if (ldown[up[y]] == -99)
                   {
                     ldown[up[y]] = y;
                   }
                 # if not, find the dangling sister
                 else
                   {
                     # start at ldown
                     mark = ldown[up[y]];
                     # move to node with an empty rsister
                     while (rsister[mark] != -99)
                       {
                         mark = rsister[mark];
                       }
                     rsister[mark] = y;
                   }

                 # test for refollowing old routes
                 if (first[up[y]] == 1) {y = up[y]; first[y] = 0;}
                 else break;

             }
         }
     }

   # Recurse through levels

   tmp = "";
   printf("%s;\n", downPar(0, tmp));

}

function downPar(atn, tmp,           x, which, tmpnext )
{
   which = 0;

   if (noat[atn] == 0)
     {
       tmp = taxon[atn];
       if (!NoBL)
         {
           tmp = tmp  ":" bl[atn];
         }
     }
   else
     {
       x = ldown[atn];
       tmp =  "(";
       tmp = tmp downPar(x, tmpnext[which]);

       x = rsister[x]; which++;

       while (x != -99)
         {
           tmp = tmp  ",";
           tmp = tmp downPar(x, tmpnext[which]);
           x = rsister[x]; which++;
         }
       tmp = tmp  ")";
       tmp = tmp  taxon[atn];
       if (!NoBL)
         {
           if ((atn != 0) || (Droptail == 0))
             {
               tmp = tmp  ":" bl[atn];
             }
         }
     }

   return tmp;
}

# END OF FILE




More information about the phylocom-user mailing list