Mar 27, 2016 - Basic Searching

Basic Searches in MuscleDB

When you search the atlas, you can filter the database by:
1. gene symbol (like ‘Per1’)
2. gene ontology (like ‘GTPase activity’)
3. muscle tissue type
4. expression level
5. p-value (statistically significant difference between tissues (based on a two-way ANOVA))
6. change in expression, relative to another tissue type.

Starting a basic search

All the search options are located in a menu at the far left side, in charcoal grey. To hide or show the options, click the three lines just to the right of the MuscleDB title at the very top of the page.

by gene symbol

If you want to search by gene name, you’ll need to know the NCBI Gene Symbol or the UCSC transcript name. For instance, for the gene period circadian clock 1, you can enter either Per1 or uc007jpf.1 into the ‘select a gene’ box.

  • Searches are case-insensitive (it doesn’t matter if you type “Per” or “per” or even “PER”).
  • All searches will start with the gene symbol name. So when you enter “per”, you will get Per1, Per2, Perm1, …
  • To return an exact query, add a $ to the end of your search. For instance, to select FST and not FSTL, type “FST$”
  • To return a series of gene symbols that end in numbers, add [0-9] after the search term. For instance, to select Per1, Per2, and Per3 (but not Perm1), search “Per[0-9]”.
  • Nerdy detail: the search queries are built upon regular expression matching, so most regex syntax should work, with some modifications for R.

by gene ontology

Let’s say you wanted instead to search by gene ontology — in other words, by gene class. Select the ‘search ontology’ button, and enter your ontology term. In this example, we’ll look for all genes with GTPase activity.

  • As you start typing ‘GTPase activity’, a dropdown menu will appear with all the possible gene classes.
  • It may take a few seconds for the list of possiblities to pop up.
  • You can only select one gene ontology term.
  • You can combine searching both gene symbols and ontologies.

selecting muscle tissues

You can also select which muscle tissues interest you. By default, all tissues are checked. For this example, we’ll look at just expression in the atria and eye tissues. Deselect the other muscles in the toolbar at the left.

Displaying the information

At the bottom of the plot options, just below ‘advanced filtering’, are the different ways to display the data. You can choose to show:

  • plot (default): a bar graph of the expression levels in the tissues (in FPKM, Fragments Per Kilobase per Million reads) for each transcript, and options to save the plots.
  • table: numeric table with the gene symbols, transcript names, expression levels in the tissues (in FPKM, Fragments Per Kilobase per Million reads), and the q-value (difference between tissues from a two-way ANOVA).
  • volcano plot: volcano plot comparing two muscles, showing the logarithm of q-value versus the logarithm of the fold-change in expression
  • heat map: a dynamic heat map comparing the expression level of each transcript for each tissue.
  • compare genes: a series of scatter plots comparing the expression levels to a particular reference tissue.

Mar 27, 2016 - Advanced Filtering

Advanced filtering the Muscle Transcriptome Atlas

Once you have basic searching down, you can further limit your search results by filtering by expression level, -value, and/or fold change.

  • expression level allows you to select transcripts within a range of expression values (in FPKM)
  • q-value allows you to select transcripts with q-values ≤ a value
  • fold change allows you to select transcripts with a fold change in expression ≥ a value, relative to a reference tissue.

Filtering by expression level

To filter by expression level, tick the expression level box, located just below the search button. This will cause a dropdown menu to appear, where you can put in the minimum and maximum expression level values, in FPKM. For instance, you might want to filter out transcripts with very low expression levels (< 1 FPKM). The graphs and tables will automatically update.

  • If any muscle tissue has expression within the range of values, the transcript will be included.
  • The table will update, and the 1,462 transcripts we had is filtered down to 1,173

Filtering by q-value

To sort by q-value, tick the q-value box, located just below the search button. This will cause a dropdown menu to appear, where you can select the maximum q-value. The graphs and tables will automatically update.

  • q is the false discovery rate, calculated using the Benjamini & Hochberg (1995) method
  • q-values can be entered either as a decimal or in scientific notation.
  • q-values can only be filtered for the entire set of 10 muscles, each pairwise set, or each group of muscle types: cardiac (atria, left ventricle, right ventricle) or skeletal (diaphragm, eye, EDL, FDB, plantaris, soleus)

Filtering by fold change

To sort by fold change, tick the fold change box, located just below the search button. This will cause a dropdown menu to appear, where you can select the reference tissue and the numeric threshold for the fold change. The graphs and tables will automatically update.

  • Fold change for a transcript is calculated by dividing the expression of all the muscles relative to its expression in the reference tissue.
  • Transcripts with fold changes ≥ to the threshold for any of the muscle tissues will be selected.
  • As a result, the filtered transcripts will be ones where the expression is upregulated relative to the reference tissue.
  • If the reference tissue isn’t one of the muscles selected in the muscle filter, it will be added.
  • Note that fold change filters are best used in conjunction with an expression filter. If the reference tissue has low expression (< 1 FPKM), any fold change will be large since you’re dividing by a number less than zero.