User:ElNando888/Blog/Conformational 3
< User:ElNando888 | Blog
(back to Conformational 2)
You may have noticed that in the previous chapter, I "conveniently" omitted to talk about an important issue: states in switches (and in single state molecules) are definitely not limited to the MFE and a couple suboptimals. There are usually many of those, and it's where things get messy...
The various shapes that a single RNA sequence can form, are not completely random. They often can be regarded as "families". Taking the example of the discussed design by JR, we can see that:
This: <img style="background-color: dimgrey; width: 100px;" src="/wiki/images/Stst2_t2.png" alt="" /> is quite similar to this: <img style="background-color: dimgrey; width: 100px;" src="/wiki/images/Stst2_t1.png" alt="" /> and
this: <img style="background-color: dimgrey; width: 100px;" src="/wiki/images/Stst2_t3.png" alt="" /> vaguely resembles this: <img style="background-color: dimgrey; width: 100px;" src="/wiki/images/Stst2_t5.png" alt="" /> which in turn is very similar to this: <img style="background-color: dimgrey; width: 100px;" src="/wiki/images/Stst2_t4.png" alt="" />
So, in a given RNA solution, which is the dominant family? And is it possible to determine a specific structure which could be described as representative?
----
Let's just take an example. This is the console output of my hacked RNAfold, in partition function mode (option -p)
GGAAAGAGGACCACUGCAGGAUAUAGUAGUGAUCAUCUACUAGAAGGGUAGUGGGUGGAUUUGUCCUAUGCUAACUUCGGUUAGCAAAAAGAAACAACAACAACAAC
......(((((((((((........)))))).((((((((((......))))))))))....))))).(((((((....))))))).....................
minimum free energy = -31.80 kcal/mol
,,,...({{{(((((((......{{||||,,.,,,,||||||.....}))))))}},,.,,,}))),.(((((((....))))))).....................
free energy of ensemble = -32.74 kcal/mol
...........((((((......((((((.......)))))).....))))))...............(((((((....)))))))..................... {-25.10 d=18.83}
frequency of mfe structure in ensemble 0.21723; ensemble diversity 21.80
There are three dot-bracket notation lines:
- The first one indicates the minimum free energy structure
- The second one is a 1D and very approximate rendering of the dot plot. Pairing probabilities are summed for each base, and the character at that position illustrates the general behavior at that position. A parenthesis indicates a strong pairing in a specific direction (5' or 3'), curly braces are also bases that will probably pair, although more weakly, pipes are also weakly paired positions, but with no strongly defined preference for a direction, commas are positions mostly unpaired, and dots are almost certainly unpaired in the whole ensemble.
- The third one is called the ensemble centroid, and it's simply the list of pairs that have more than 50% chances to form.
In many cases, MFEs and ensemble centroids are identical or very close. Sometimes though, they differ.
----
You may want to ask: why bother with centroids? In response, I would suggest that you read RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. And you would realize that centroids are a reality of RNA folding, a different point of view if you will, which we are completely missing when we focus exclusively on MFE.
And to conclude this chapter, I'm thinking of yet another request for the dev team: the centroid structure is very easy to get when the partition function has been computed. Which means that it would be possible to draw this structure in the same dot plots we have in EteRNA. I'd suggest to have it in the same corner as the MFE (lower-left), simply plotted differently (something like grey diamonds?)