User:Meechl/R scripts

From Eterna Wiki

Want to look at the data from RMDB in R so you can make cool graphs and analyses? You’ve come to the right place! :)

I apologize in advance for any ugly scripting. I’ll try to make it look presentable before posting, but my style of coding is often trial and error until it works. I think it would be cool to get my scripts onto github to share them, but right now I don't understand github, so I'll start by just posting them here.

If you are using data that was already formatted (so, if you took the data from Meechl’s google drive), then you can load it into R quickly and move on to the fun scripts. You can use the built-in read.table function. For information on how to use it, type ?read.table into R. You'll probably want the data saved in csv format before loading. Example:

      Stats = read.table("C:/Users/Meechl/Documents/Eterna/csv/R94.csv", sep=",", header=TRUE, na.strings=".") # read in data

However, if you are taking rdat data directly from RMDB, it’s a little more complicated. First, you’ll need to load the rdat file into a spreadsheet, unstack it, and remove several characters. Then, you’ll need to run a special function that will format the data so R will understand it. How exactly do you do this?

Loading the rdat data into a spreadsheet, unstacking, and removing characters:

  1. Download the data in rdat format
  2. Load the rdat data into excel (I use the text import wizard, this is probably doable in google spreadsheets, but I haven’t tried)
    1. Open a new spreadsheets
    2. Go to “data”, “from text”, then select the .rdat file and click “import”
    3. Make sure “delimited” is checked, then hit next
    4. Make sure “tab” is checked, then hit finish
  3. Unstack the data (we want to get all of the data for each sequence to be on the same line)
    1. Open a new sheet in the current excel file
    2. Copy and paste the first line from another file (should read: Annotation, ... X1, X2, ... E1, E2, ...)
    3. Copy and paste the annotation data for all sequences (all rows) from your initial sheet to the new sheet
    4. Copy and paste the reaction data for all sequence (all rows but the first) from your initial sheet to the new sheet
      1. Make sure to paste the data in the box right under “X1”.
      2. Scroll over to the end of the reaction data in the new sheet, and make sure that the number of reactivities you pasted was the same as the number of X’s in the header. If they aren’t the same, adjust the number of X’s in the header.
    5. Copy and paste the error data for all sequence (all rows but the first) from your initial sheet to the new sheet
      1. Make sure to paste the data in the box right under “E1”.
      2. If the number of X’s in your header was different from the number of reactivities, you will also have to adjust the number of E’s in the header to match the data.
  4. Remove all # “ ‘ , and replace with a space (in my experience, R gets confused by those characters)
    1. Hit ctrl-F, then switch to the replace tab
    2. In the “Find what:” box, type #
    3. In the “Replace with:” box, type one space
    4. Click “Replace all”
    5. Repeat for the other four characters
  5. Save the file as a .csv (comma delimited) file (remember where you saved it to!)
  6. Open R!
  7. Solutions to possible problems
    1. If you are using a mac and something doesn’t work the same, sorry? I’m a PC person. :)
    2. If you can’t find the rdat file that you downloaded, going to your downloads folder and sorting by most recent usually works. Or, you can right click on the download in the browser, say show in folder, and that’ll show you where it is. Then you can save it to someplace more convenient. Or
    3. If you didn’t find any # “ ‘ , in the file, it probably does have some, but missed them. Make sure that you didn’t select a group of boxes before hitting ctrl-F, because if you did, it will only search that selection of boxes.
    4. If you get warnings when saving as a .csv, don’t worry, just click ok.

Reading data into R:



Generating free energy values and structures for MS2 data:

Use the script here:

Or for vienna2:

  1. Download the data into excel
    1. Move the sequence column to be just to the right of the id# 
    2. Filter by puzzle name 
    3. Copy the first two columns (id# and sequence) into a new sheet, and save that sheet in csv format. 
  2. Open that file in notepad, copy the whole thing
  3. Input it into the script. 
  4. Add the cst to the script, then run. 
  5. Copy and paste the output into notepad, and save as .txt.
  6. Use the text import wizard to import that file into excel, then copy and paste the free energies and structures into the original excel file. 
  7. Repeat for all sublabs. 
  8. Remove all # " ' , then save as .csv, and then you can load the data into R.


I'm slowly adding my other scripts for more analysis-type stuff here: