User:Omei/Cloud Lab Data Mining Tool
Cloud Lab Data Mining Tool
At this point, the biggest need I feel in EteRNA is for a way to extract meaning from the results of thousands of lab designs that have been synthesized. Ideally, something addressing this need will be built into the EteRNA GUI. But I decided to just see what I could do to contribute ideas and a sample implementation.
I envision three major major parts.
- A method for finding the labs that are relevant to a particular question. As a starting point, this might take the form of searching by sequence or structure motif. Existing examples are CoSSMoS and RMDB.
- A method of interactively viewing the SHAPE results for all the synthesized designs in a single lab. This is the part I am actively working on.
- A method for integrating the results of the previous two steps. For example, after finding a relevent lab (step 1) and developing a hypothesis based on it (step 2), it would be nice to be able to gather up all "analogous" data from any other relevent labs, to see if the hypothesis is consistent with other labs. This is the step I am least clear about how it should work.
As for my approach to step 2, I recently wrote up a short preview article. If you have any comments, you should be able to make them in that document.
You can download the most recent version of the Database Mining Tool .zip file from https://drive.google.com/folderview?id=0Bzf0qUriSfzWUmlvTU5mUTl0Rlk&usp=sharing. Eli and I are writing more documentation; you can see the current progress at https://docs.google.com/document/d/1f_jR9ydQWtMCZoKCSkhv-GSyZFnwoV50M9mIJqqj1T0/edit?usp=sharing