Bioinformatics experiments. Exp 1. Analysis of mechanisms of Cd+2 impact on MAPK-signalling through DUSPs (or “Catch me (Cd2+) if you can”). Part 2. MSA (Multiple Sequence Alignment)

in #steemstem6 years ago (edited)

The objects of the entire 1st experiment of this series of posts are MKPs (MAP kinase phosphatases).
The subject is possible mechanisms cadmium could “use” to influence MKPs (and thus MAPK signalling pathway).
The purpose of the 1st experiment is to leverage some bioinformatics tools to find those possible mechanisms.

Let’s try to carry out MSA (multiple sequence alignment) of MKPs, namely DUSP1.

We are going to use Clustal Omega tool to do it [1].

Proteins, DNAs and RNAs are polymers consisting of repeated monomers.
MSA is an alignment of those sequences.
And the alignment is the process of finding similarities between different sequences.
In a nutshell, this tool is just provided (by us) with sequences (in our case protein sequences) and then it analyses them and tries to find identical (or similar) monomers presented in provided sequences.

(if this concept is new for you, then probably it would be much clear when you look at the image near the bottom of the post)

You can use this tool for example
1. to make a beautiful illustration for your article. Let’s say that you know what residues you are looking for (and you know that they are conservative (so that actually you could even do that alignment by hand)), but this tool helps you to present that information in an easy-to-understand less time-consuming way.

2. to compare sequences of a particular protein of different organisms to try to figure out if there’s something conservative there. And if there is (if there are some identical (or similar) residues there), then you can assume that those motifs/regions are responsible for some important functions (structure defines functions) and probably represent a protein domain/motif. And then analyse those proteins more carefully.


As we mentioned in the previous post, Cys residue is responsible for catalytic activity of phosphatases [2].
Usually catalytic motif ((V)-HC-XX-X-XX-R-(S/T) in our case) is highly conservative among different organisms. So, it should not be a surprise that we will see all those Cys residues of DUSPs aligned in 1 column in different organisms.


What would be interesting to see (for the purposes of our experiment) is if there’re some other conservative Cys residues in those proteins.

Aside from catalytic centre, enzymes also have some sites for other molecules to regulate their activity (allosteric site/regulatory site). In the case of this series of posts (where we are discussing MKPs) those Cys residues will be very important, because Cd+2 possibly indirectly could influence MKPs activity through those Cys residues. (we will discuss this in the next posts, and this possible "indirect" effect is in the main focus of the 1st Exp.).

So, let’s try to align DUSP 1 sequences of different organisms and see if there are any conservative Cys residues (except for Cys of catalytic site) there.

First of all, we need to get those sequences. For that we’ll use UniProt.
UniProt (Universal Protein Resource) is the central place for us to get proteins sequences and information about them [3].

We’re going to analyse DUSP1 because searching for other MKPs gives us just 1-2 DUSPs entries (and searching for DUSP1 gives 5 entries).
You’ll see the entries. Then choose in “Filter by” filter (on the left) “Reviewed” option. This removes “Unreviewed” entries. As a result we get only entries annotated (documented) by experts (rather than automatically generated annotations (for more information on this go to https://www.uniprot.org/help/about)). At the time of writing (May, 2019) we get 68 results. And we get a lot of entries of proteins which are not actually DUSPs, but somehow relate to them. We need only DUSPs, so we’ll choose the first 4 entries (4th and 5th entries are almost identical (and belong to one organism), so we’ll use just one of these (with Q91790 entry identifier)).
We have the sequences of 4 organisms (Xenopus laevis (Amphibians) and Homo sapiens, Mus musculus, Rattus norvegicus (Mammals)).
Click the “Column” option to get rid of some unnecessary columns (leave only “Length”, “Organism”, “Entry name”, “Gene name”, “Protein names”). Click save at the bottom of the modal window. Then choose/check the first 4 entries and download them in ‘FASTA (canonical)’ format with the help of “Download” option.

It should look like…


(the image was created by me with Notepad/Paint, and you can use it if you want. Sequences were obtained from UniProt)


|| Useful tip
FASTA format is a text format used for DNA/RNA/protein sequences representation.
Aside from sequences themselves it also might contain kind of meta-information (at the beginning) such as UniProt identifier, species name, full proteins name etc. [4]. This is similar to when we use Markdown at Steemit (it also contains some meta-information, aside from text itself).



Then copy the result UniProt gave you to txt-file and save it.

Then we go to Clustal Omega website.

Just copy the sequences we got to its input field, leaving all options as they are, click “Submit” at the bottom and wait…

 You‘ll see a set of options on the results page. Choose “Show Colors”.



(The image above was created by me with the help of Clustal Omega. DUSP1 sequences of 4 organisms (Xenopus laevis (Amphibians) and Homo sapiens, Mus musculus, Rattus norvegicus (Mammals)) were used for this example. You can use the image if you want)

where
"-" (hyphen) - insertion or deletion mutations;

different characters in 1 column - point mutation;

"*" - single highly conservative residue;

":" - residues with very similar properties;

"." - residues with less similar properties;

[5]


So…
All conservative Cys are highlighted with black thin vertical stick (12 in our case).
(V)-HC-XX-X-XX-R-(S/T) motif is highlighted with yellow horizontal stick (and as we can see, indeed, it’s highly conservative in all 4 organisms).
Possible point mutations (in the sites with Cys residue) are highlighted with orange sticks (5 in our case).
At the beginning of the sequences we can see a **deletion mutation** (magenta stick).
We have the sequences of 4 organisms.
And as you probably noticed, protein sequence of Xenopus laevis is a little bit different (there’s a deletion mutation at the beginning and 5 point mutations (orange sticks) in the mammals).


|| Useful tip
To get a nicely formatted article/paper (for ex this) reference on
PubMed (free portal/search engine for us to get abstracts/citations on life sciences/medical topics) you can choose "Send to" option, then check "File" option (in "Choose Destination" category) and then "Summary (text)" in the "Format" category, and click "Create file". [6]



Glossary

alignment - the process of finding similarities between different sequences of proteins/DNAs/RNAs;

motif
a distinctive sequence on a protein or DNA, having a three-dimensional structure that allows binding interactions to occur
[Oxford Dictionary of English, 3rd Edition, Oxford University Press 2010]

domain
a distinct region of a complex molecule or structure
[Oxford Dictionary of English, 3rd Edition, Oxford University Press 2010]

allosteric site/regulatory site - the site of an enzyme which is used by other molecules to regulate activity of that enzyme;

catalytic/active site - the site of an enzyme where a chemical reaction happens with the enzyme's substrates;

residue - a monomer of polymeric chain (of proteins, DNAs, RNAs);

annotation - the process of association of information with particular protein/gene



References

1. Clustal Omega

2. Caunt CJ, Keyse SM. Dual-specificity MAP kinase phosphatases (MKPs): shaping the outcome of MAP kinase signalling. FEBS J. 2013 Jan;280(2):489-504. doi: 10.1111/j.1742-4658.2012.08716.x. Epub 2012 Aug 28. Review. PubMed PMID: 22812510; PubMed Central PMCID: PMC3594966

3. UniProt

4. https://en.wikipedia.org/wiki/FASTA_format

5. https://en.wikipedia.org/wiki/Clustal

6. https://www.nlm.nih.gov/bsd/pubmed.html

Sort:  

Hello @alexbiojs it is important that you take into account some considerations to use the steemstem label, one of them is not to publish more than 1 time a day, it is advisable to publish maximum 3 or 4 post per week, otherwise you can be classified as spam

Hi.
I didn't know that.
thanks
no problem.
from now on, will follow your advice