MSAReveal Versions

Send suggestions and bug reports to
Previous versions: 2.8 2.7 2.6 2.5 2.0 1.0

Version 2.81, March 13, 2021

When the header contains more than one apparent UniProt ID code, and the codes are distinct, a red note listing the numbers of such sequences appears above the first table of sequences. The red color was inadvertantly being applied to the entire sequence listing.
Strain ID codes were rarely being mis-interpreted as additional UniProt IDs.

Version 2.8, October 12-13, 2016

Invalid or non-numeric wrap lengths were not being properly handled. (Fixed 10/12/16.)
Sequence descriptions now wrap as needed to fit the browser window. (Fixed 10/12/16.)
Checking or unchecking the color options, before processing any sequences, failed to update the coloring of the amino acids near the checkboxes. (Fixed 10/13/16.)

Version 2.7, October 1, 2016

In Statistics, the non-numeric columns (Taxon, Gene, UniProt, and 3D) were not sorting correctly.

Version 2.6, September 28, 2016

"100% identical" means that all the amino acids in a given column are the same. When >50% of the columns in an alignment are 100% identical, a new option is offered. Highlighted in yellow above the sequence tables is a checkbox to "Highlight Differences". When checked, colors will be applied only to columns that are not 100% identical. This makes it easy to find columns with differences when nearly all the columns are 100% identical. For an example, click Consensus wrap test + highlight differences under Show Demos & Tests. (Thanks to Orly Dym of the Weizmann Institute of Science for an example that raised this idea.)

Version 2.5, September 25, 2016
This version makes it easier to analyze large numbers of sequences. For example, click the button Show Demos & Tests, then the link Larger Examples, and process the 401-sequence alignment 110KAA-MHC1-401seqs.fasta.

The number of sequences and total number of amino acids is now at the top of the results, so you don't have to find the bottom of Statistics Part I to get these values.
When you search for a sequence fragment, using the Find slot, the numbers of hits are now listed in the Statistics report, in addition to being above the display of sequences. The Found hits column is sortable, as are all columns in Statistics. This makes it easy to gather together all taxa with hits, and to see which taxa have the highest numbers of hits, or the lowest.
Statistics now gives the minimum, average, and maximum values for each column.
New options allow hiding the row numbers, gene names, UniProt identifiers, and/or consensus. Hiding these makes the table more compact, which is useful for showing snapshots in slides.
New help has been provided about How To Take Snapshots For Slides.
A pink banner "Processing ... one moment please ..." now appears when an action is likely to take more than a few seconds. It has a fixed (non-scrollable) position, and disappears automatically when the action is completed. Example: checking or unchecking a color to be applied to the sequence listing.

The Statistics table has been reorganized. Counts of amino acids and their derivative quantities are now gathered at the end of the table (the second section when wrapped).
When you touch a column in the Consensus, the amino acid with the highest frequency in that column is now boldface, and its percentage of all sequences is given.
Navigation through long sequence listings is easier. There are new links to jump to the next or previous (wrapped) block, or the top or bottom of the listing, and to jump from sequences to statistics to full headers.
The header of Statistics may scroll out of view with large numbers of sequences. Because the table is sortable, periodically repeating the header is not feasible. Now, a tooltip on every cell in Statistics tells what that cell is reporting.
As in previous versions, when the taxon is not recognized in a header with an unusual format, the entire header is shown in the taxon column. In this version, the header is truncated when it is too long to fit well in the results table, and a link appears that jumps to the corresponding row number in the Full Headers section. Demonstration: "ConSurf Clean UniProt (unusual header format)".
Additional help is provided for coloring or counting a specified amino acid or subset.

When no genus/species was recognized in the headers, the consensus was one column too far to the right in the 2nd and beyond wrapped sequence tables.
The checkbox "Wrap Results" was reset when a color checkbox was changed. Pressing "Process Sequences" did not force "Wrap Results" to obey "Wrap initial display".
All vertical bars "|" were replaced with spaces in the Full Headers display. Example: "6: Enolase TCOFFEE".

Version 2.0, September 5, 2016.
This version is much more complicated than version 1. Please report bugs and requests to

A powerful Find enables you to search for sequence fragments, regardless of gaps. Finding ambiguous sequences is supported. See Finding Sequence Fragments.
If you don't want the sequence numbers to begin with 1, you may now add "start=N" to the header of the sequence. N can be positive, negative, or zero. For example if you want the first residue of the mature protein to be number 1, and it is preceded by a 7-amino acid signal sequence, you can add "start=-6" to the header. Now the signal sequence will be numbered -6 to 0, and 1 will be the first residue of the mature protein. This is illustrated in the demo "3: Pilins Pa unaligned".
A new button, All or None, makes it easy to check or uncheck all the color options in one click.
Glycine may now be colored light green. Formerly it was the only amino acid with no color assigned. Proline remains a more intense green.
A count of glycines in each sequence is now included in the Statistics table.
Browsers: The Firefox browser displays the output of MSAReveal with highly responsive scrolling and tooltip display. Other popular browsers are much less responsive with larger alignments. The browser in use is now detected, and Firefox is recommended or required depending on the total number of amino acids in the job. See Browser-Specific Behavior.

The gray period ("dot") in the consensus line has been replaced with a gray colon. This seemed more consistent with usual conventions, for example in Clustal.

Clicking the button Process Sequences now diplays Processing sequences ... before (in the case of very large alignments) it freezes the browser. Processing 1,068,780 amino acids takes 1 min 20 sec in Firefox on a late 2014 MacBook Pro.
Checking Wrap Results now wraps all three sections (Sequences, Statistics, and Full Headers).
The range of column numbers is now shown above each table when the sequences are wrapped.
The Statistics table now shows the grand total number of amino acids for all sequences at the bottom. When the table is sorted by clicking on a header, the grand total remains at the bottom.
When ambiguous amino acid characters (BJOUXZ) occur in any sequence, a button appears offering to find all of them. Ditto for illegal characters. (Legal characters are A to Z, a to z, and dash "-".)
A mechanism is provided for including a hyperlink in a description, while avoiding having the header line break at the space after "<a" in the Full Headers display. See Hyperlinks in Descriptions.
A dividing line is now placed between every 5 sequences for improved readability.
Sequences with >>Descriptions (added in their headers) now have their taxons shown in dark red, so it is obvious which sequences have such descriptions added. Demo: 5: Long (Gal4).
MSAReveal has now been tested with an alignment contanining over one million amino acids. Demo: click on Larger Examples, load and process titin-MAFFT-178KAA.fasta. It has 178,130 amino acids total. If you paste it into the box six times, 1,068,780. Processing takes about 1 min 20 sec on a late 2014 MacBook Pro.
Additional "Demo & Test" examples have been added. These illustrate all possible consensus outcomes for 2, 3, 4, and 5 sequences. Also there are more cases under "Larger Examples", covering a wider range of total amino acids (useful in testing different browsers).

In the consensus, when there are only 2 sequences, colons (formerly dots) were not being shown when both amino acids in a column belong to the same similarity group. The Demo "2: Tiny aligned" has been changed to illustrate a colon.
The tooltip from a consensus residue was not showing amino acid frequencies, or was showing incorrect frequencies from a previous processing run, when there are only 2 sequences.
When the sequences are of different lengths, the consensus was not always counting empty rows as non-identical residues. The correct behavior can be seen in demonstration "5: Tiny unaligned, all 5 consensus outcomes".
When the sequences are of different lengths, if the longest sequence was the top row, after all the other sequences had ended, every residue in the top row erroneously generated a black upper case "consensus". A clear example is "5: Tiny unaligned, all 5 consensus outcomes".
When the sequence contains illegal characters, they are highlighted in red. Touching them showed "undefined" in the tooltip.

Version 1.0, August 23, 2016.

Credits Help License: bottom of main page.
Bug reports or feature requests to