CryptoBase  
Home     Search     Login    
Jump to gene


Welcome to the "How To" Page of CryptoBase!

     This page provides instructions on how to use the database to explore the C. neoformans serotype A genome database. In addition, it provides some general information about CryptoBase. There are three types of pages of particular utility. The "Search Page" allows users to identify genes based on a large variety of powerful criteria. The "Results Pages" displays the results of searches in graphical form. "Gene Pages" describe individual genes.

Search Page

     The Search page allows users to find a set of genes by building a query based on a number of criteria. Users will immediately notice that there are two halves of the Search Page, one for creating a search for the genes you would like to see and one for creating a search for the genes you do not wish to see in the results (that is criteria for "inclusion" on the left half of the page and criteria for "exclusion" on the right half of the page).
      In each half, there is first an option for a keyword search that will identify C. neoformans genes that have hits containing the words entered by the user listed in the box next to "NR Hit contains words".
      The second option on the Search Page allows users to restrict the search based on the similaritiy of genes in C. neoformans to those in other fungal species.. We have compared all genes CryptoBase to the gene annotations or genomic DNA sequences of a variety of species of fungi. Therefore, users can restrict their search based on the strength of sequence homology of C. neoformans genes to those of other fungal species. Users can select species and BLAST Expect Values ("E values") for inclusion in the database search (on the left half of the page) and for exclusion from the search (on the right half of the page). Users are also given an option of "AndSpecies" or "OrSpecies". If you use the "AndSpecies" option the C. neoformans genes identified in the search will be required to have homology at the given expect value or better for every species selected. If users choose the "OrSpecies" option, the genes returned by the search will only be required to have homology to one of the species selected.
      An example is given below where a user has searched for genes this display homology corresponding to a BLAST E value of 1e-100 or better with the genomes of Ustilago maydis or Phanerochaete chrysosporium but do not have homology with a E value of 1e-4 or better with Saccharomyces cerevisiae, Aspergillus nidulans or Neurospora crassa.
      The final powerful option available on the Search Page is to restrict searches based on motifs present in proteins. Users can ask for C. neoformans proteins that contain any particular motifs present in either the SMART or PFAM databases.
      Most importantly, the text-based, species-based, and motif-based search restrictions can be combined allowing for a large variety of highly flexible search queries.

Search Results

     The Search Results pages display the C. neoformans genes that adhere to the search criteria given to the database by users. The pages list an internal called CDSid, the Name of the gene and a proposed description. It also shows an image generated by SMART of protein motifs of each gene returned by the search. By clicking on the CDSid link, users are taken to the Gene Prediction page for that particular gene. An example of a Search Results Page is shown below.

Gene Prediction page

      At the top of the Gene Prediction page is the name of the predicted ORF, followed by a snapshot view of its region in the genome. Below that is some general information about the gene including the proposed name and description, the homology to other organisms and the results from the SMART analysis. The BLAST hits against the nonredundant protein database are next. At the bottom is some information about the coding sequence and protein sequence for the gene as well as the microarray oligonucleotide generated for it.


Gene Naming conventions

      Genes with a hit to a gene in S. cerevisiae are given an equivalent name. If multiple genes have the same S. cerevisiae gene as its best hit, 100 is added to the number of S. cerevisiae gene and the C. neoformans homologs are named sequentially. C. neoformans genes without a S. cerevisiae hit are given the standard CN# designation from the serotype D homolog assigned by TIGR and Stanford. C. neoformans genes described in the literature are named based on the first refereed literature citation. We propose that naming conflicts are resolved by the groups involved and that the webmaster be informed when a consensus is reached by those involved.

Database Feature Improvements

      Please e-mail the webmaster for suggestions for feature improvements for the database. These will be discussed internally and those features deemed useful and implementable will be added.