< User:ElNando888 | Blog
Creating puzzles is an art. Some players have reached amazing heights of difficulty in their compositions. Their secrets? To be perfectly honest, I have aboslutely no clue. Though I could imagine that they learned about some ambiguities present in the software, and learned how to use them.
For instance, this structure looks rather simple. And yet...
An important fact about free energies and orientation: they are rotationally symmetric. Here for instance, due to the nearest neighbor rules and the fact that the closing pairs are identical, you can exchange the tetraloop at the bottom and the hook at the top, and you must get the same free energy total.
And it does work as expected...
Though, something's not exactly right. Why is the puzzle solved in one case, and not in the other?
If we check what is the MFE in this second case, we notice that the "problem" has to do with a possible ambiguity: U12 may bind with either G23 or G24, and this results in both structures having the exact same free energy, -10.1 kcal/mol.
Checking back the very first example, we see that here too, both structures have the same free energy, but this time, the software decided to make the other option the "major" one.
Is there any hidden RNA folding rule in this phenomenon?
I don't know much about RNA, but I know softwares, and algorithms. Let's take an example. Suppose we have a list of pairs of the form (label; value), like
- (E; 5)
- (A; 3)
- (D; 2)
- (B; 1)
- (C; 5)
And now, we want a program that will tell us which label has the best value. The simple way to write the program is to say:
winner = nobody
for all doublets in the list (taken alphabetically)
if the value is better than the current best, make that label the winner
show who's the winner
What is going to happen? The program will output C.
First remark: the program is not designed to give you a list, only a single element. One way or the other, the result can and on occasions will be ambiguous.
Second remark: a tiny change in the program:
if the value is greater than or equal to the current best, ...
and you change the outcome, the output will now be E
And it turns out that things are even worse than we thought with our examples. There is another structure that reaches the exact same free energy.
What can we say about the folding algorithm?
- it probably tries to maximize the numbers of pairs
- in caae of ambiguities, it will prefer the pair whose bases are closest to the ends
And this is essentially my conclusion with this case: the discrepancies have nothing to do with RNA, they are artifacts generated by algorithms.
Update: Höglahoo contributed these examples to the discussion (thanks a lot)
This example seems to indicate another rule, "minimize the number of stacks in multiloops", which makes the left option less desirable. Then, the rule about the distance to the ends would apply: base 9 is closer to 1, than base 30 is to 50.
The examples below seem indicative of yet another type of rule: priorities between pairs.
Here, the fully paired option (on the left) is inferior energy-wise. The choice comes down to UG vs AU.
And apparently UG < AU
This time, maximizing the number of pairs takes precedence.
And this seems to indicate that AU is also better than UA
Boiling down what we know so far, the programmatically induced rules, in order of priority, would be:
- minimize the number of stacks in multiloops
- maximize the number of pairs
- pairs priorities, of which we now know that AU > UG and AU > UA
- and closest pair to the either ends