User:ElNando888/Blog/What if somebody already knows?

From Eterna Wiki

When someone faces a problem, or is baffled by something inexplicable, it is quite often the case that someone else has already "been there, done that"...


This afternoon, I stumbled on an old document on some server at Harvard, and that is the first of the links listed that you'll find at the bottom of this page. Something looked weird about it ("Jonathon"? missing spaces between words?), but I figured that this document was probably OCR'ed. Much more interesting were the mentions of certain special properties of T7 RNA Polymerase.

Under this (for non-scientists) barbaric name hides the engine which outputs RNA sequences as copies of DNA templates, and that's precisely the one in use in EteRNA labs.

Partly because of the "weirdness" of that page, and partly because I was simply curious, I went on to search the scientific papers cited, and I kept going, from one paper to the other. I'm not expecting that anyone but the most insane EteRNA players will make the effort of reading them, but as I mentioned above, the list is provided below. And for the others, I'm going to try to boil down what I (think I have) learned.



To begin with, something I knew already a long time: T7 RNA Polymerase, which is found in viruses called bacteriophages, is pretty much like a racing car, specially compared to the polymerases found in bacteria and in eukaryotic cells (among the last ones, yours and mine). With speeds measured around 250~300 nucleotides transcribed per second, it beats flat out bacteria (about 50 nts/sec), and us (about 3~4 nts/sec, pathetic). The analogy with a racing car helps me to understand better what follows.



T7 RNAP is quite specific with its binding affinity. And also, it really really really likes Guanines for starting its race. To be more precise, it's dC-rG, which means DNA-C and RNA-G. Continuing on the car analogy, the DNA is the road, the RNA is the exhaust.

The other nucleotides are possible, but it would be like driving an Indy-500 machine on a countryside road. So, the first base should always be a G. Did I mention that T7 RNAP likes G? Hell yeah! You will get the best performance if you give it another G. Same deal, the other nts are possible, but the performance suffers.

Now, here comes the little surprise. For the third base, G should be avoided. Why? Well, remember I told you it's a racing car? It so happens that giving it a third G, sort of makes the road ultra-smooth, and the tires start slipping on it, which causes T7 RNAP to repeat the Gs. Not good.

Not everyone will remember, but if you've been playing long enough, you may reminisce that there used to be a time when the lab tails were a little different. Before Cloud Lab, the prefix was already GGAAA, but only the first three bases (or was it the first two?) were locked, not all five. Which seems to tell us that fellows at the lab knew those facts.

After that, and according to the scientific papers I just finished reading, there seem to be a few more known factors, which seem to conflict (a little) with current EteRNA lab designs:

  • It is reported that maximizing the number of Gs in the first 8 bases (except the third of course) improves transcription performance. So, I wonder why it was decided to force EteRNA players to use 2 more As in that segment. Also, in the latest round of Cloud Lab, the experimenters decided to "tune" sequences in one of the labs, because they happened to be too long. The removed bases were the 3 As. Consequently, and just for fun, I made my bot Vinnie rebel against the system 8P
  • It is also reported that any U in that same span of the first 8 bases will cause a strong drop in transcription performance. Consider it like an oil spill in the first curve of the circuit. For some reason, it makes T7 RNAP derail... I never saw anything like that mentioned anywhere in EteRNA, and since we only recently started to get more insights into the raw data (reactivity errors for instance) thanks to RMDB, I suspect that this may have been unnoticed until now...



That source web page also clearly mentions problems with long (8+) stretches of repeated As or Us. This sounds strangely familiar...

Unfortunately, I couldn't find a more detailed description of the phenomenon, but it was mentioned in another paper, and it seems to be caused by T7 RNAP "slipping" over those stretches, making it generate shorter or even longer strands of As or Us.



Now, some of these papers listed below are quite old, and there's the possbility that EteRNA is using a mutant of T7 RNAP, maybe designed more recently, which wouldn't have the features I described above. Maybe... but on the other hand, do you believe in coincidences?