User:ElNando888/Blog/Outsider

Until a few hours ago, I always thought of the nucleobases as being simple members of a family, each individual being in essence functionally equivalent. A few subgroups (purines/pyrimidines), a few pairing affinities, yes, but there would always be some form of dichotomy, another member of the family with similar properties. Today, I think I learned that one of them is clearly different than the three others...

Incongruity

It all began as I was looking for something else (of course). A secondary structure popped up, I can't recall on which sciency website that was, and it occurred to me that something was seriously off. According to that record, this is what was supposed to be happening:

Whadda... ??

So, they were telling me that a lone Guanine and a lone Cytosine flanked by two solid canonical pairs would choose to ignore each other. Is someone pulling my glycosidic bond here?

A good friend

Naturally, I needed proof. So I went to visit my good ol' friend FRABASE. And I was up for a big surprise...

The query

Query submitted (Loop)
Case sensitive (modified residues sensitive search): No
Strand shift operation: Yes
Sequence 1 length: 3
Sequence 1 nucleotides: CGU
Sequence 2 length: 3
Sequence 2 nucleotides: ACG
Experimental method: Any
Include all models of the structure: No

returned

Number of matching fragments: 90

Uh? Yes, 90. A lot, but not too many that I couldn't comtemplate checking them all, or most.

Strange 3D conformations

How do you decide what is a canonical pair in 3 dimensions? Well, bases are those planar obects, with essentially 3 edges. If you consider the way that two bases may relate to each other, you will find 6 parameters, one for the translational displacement, one for the rotational angle, and that on 3 different axis. And canonical pairs are simply 2 bases that fit certain requirements on these parameters.

Note: for those 90 hits mentioned above, there is probably like a bazillion hits for normally formed GU/AC quads. The ratio between those two sets is simply a function of how tolerant the algorithm(s) were when they annotated the 3D data sets with dot-bracket notations. I don't think I will be investigating these details, because for my purpose, it is enough to know that certain instabilities do occur.

So, in the picture below, it becomes rather clear why this CG was catalogged as noncanonical (and then accordingly depicted as a dot in the secondary structure)

Apparently, the bases are not in a good enough alignment of their respective planes. And if the hydrogen bonds do form, they are likely to be weaker than in more properly formed GC pairs.

But there's also an interesting feature in this picture: the marked atom is an oxygen, and I started wondering why it was there. I couldn't find any even remotely valid answer, so I kept browsing other structures from the FRABASE result set.

Many different conformations appeared, but I was able to see a common trait in this group of structures. In quite a few, I would find the same O2 of the Cytosine apparently trying to "approach" the AU pair. In other cases, it seemed to be a Nitrogen from the opposite Guanine. And once or twice, it was even the O2' on the ribose attached to the Cytosine.

Then I reminisced a recent conversation I had with Rhiju, about electrostatics. It occurred to me that these atoms apparently attracted to a point above them are all somewhat electronegative. And it dawned on me that there was something "special" about the AU pair. Well, not really the pair...

The Outsider

From the same model as above. I removed the "noise", and highlighted the Watson-Crick edges of the bases. Can you spot it now?

Yep, the Adenine only has 2 electronegative atoms on that edge, not 3.

And "curiously", the spot where the third atom is missing seems to be the blackhole attracting electronegative atoms from the GC pair below. In some cases, the effect is even quite dramatic.

More examples

Apparently, the electrostatic stabilization of this specific local area is so important that some molecules may elect to give up (!) the GC canonical pair altogether and adopt a most unusual double T-shaped stacking confomation.

(PDB accession 3J2A, residue 744 in chain N)

Notice that here again, the O2 atom of the Cytosine is in the vincinity of the "missing" atom of the Adenine.

After looking at so many examples, I wondered if a different nucleobase in the place of that Adenine would not have allowed for more stability. But which one? Well, we need a purine, in order to keep the overall 3D shape of the structure, with 3 electronegative atoms, and that can pair with Uracil... Easy as pie, right?

And now...

What do you make of this?

What do you know? Changing the A for a G does improve the stability in this context, despite the weaker pairing.

Moral

So it would seem that the different Adenine would be at times a little rebellious, and that it doesn't feel so well in the perfect order of the double helix. Does it surprise you now that this outsider feels so well in the vast free spaces of the unpaired segments of RNA? :)

Little reminder

Please never forget that I am no scientist in this field, and it would even be debatable if I am one in my own field (software and computing).