Wednesday, October 21, 2009

BioTorrents - a file sharing resource for scientists

Let me ask you a question. If you just wrote a new computer program or produced a large dataset, and you wanted to openly share it with the research community, how would you do that?

This is a diagram of a Wikipedia:Peer-to-Peer ...Image via Wikipedia


My answer to that question is BioTorrents!

This has been a side project that I have been working on lately and considering this is the first international Open Access Week I thought I should finally announce it.

BioTorrents is a website that allows open access sharing of scientific data. It uses the popular BitTorrent peer-to-peer file sharing technology to allow rapid file transferring.

So what is the advantage of using BioTorrents?

  1. Faster file transfer

    • Have you tried to download the entire RefSeq or GEO datasets from NCBI recently? How about all the metagenomic data from CAMERA? Datasets continue to increase in size and downloading speed can be improved by allowing multiple computers/institutions to share their bandwidth.


  2. More reliable file transfer

    • BitTorrent technology has file checking built-in, so that you don't have to worry about corrupt downloads.

    • Decentralization of the data ensures that if one server is disabled, that the data is still available from another user.


  3. A central repository for software and datasets


    • Rapid and open sharing of scientific findings continues to push for changes in traditional publication methods and has resulted in an increase in the use of pre-print archives, blogs, etc. However, sharing just datasets and software without a manuscript as an index is not as easy. BioTorrents allows anyone to share their data without a restriction on size (since the files are not actually hosted or transferred by BioTorrents).

    • Titles and descriptions of all data on BioTorrents can be browsed by category or searched for keywords (tag cloud coming soon).

    • As long as there is at least one user sharing the data it will always be available on BioTorrents. Those pieces of software or datasets that are not popular and not hosted by a user will quietly die (removed from Biotorrents after 2 weeks).

I am continuing to update BioTorrents, so if you have any suggestions or comments please let me know.
Reblog this post [with Zemanta]

Wednesday, October 7, 2009

Canada gets Google StreetView

Google launched their StreetView program in major Canadian cities today. Of course I don't live there right now, but I did check out my old stomping grounds in Vancouver, BC.


View Larger Map

Also, they happened to do Chester, NS which is where I usually spend most of my summer vacations.



View Larger Map
Reblog this post [with Zemanta]

Sunday, August 2, 2009

Storable.pm

Most of my programming is what I like to call "biologically driven"; that is the main end result is not the development of the program itself, but rather the data that comes out of the program. Many times this involves writing a script to input data, do something to that data, and then output it back to a file which is in turn read into another script....ad infinitum.

The classic tab-delimited file is usually my typical choice for the intermediate format, but reading and writing (although simple) these gets repetitive and more complicated for more complex data structures. I finally looked into alternatives (something I clearly should have done awhile ago) and came across Storable.

Basically, it allows you to save/open any perl data structure to/from a file.
It is very easy to use:
use Storable;

#Reference to any data structure
$data_ref;

store($data_ref, 'my_storage_file');

#later in same or different script
$new_data_ref = retrieve('my_storage_file');
Check it out if you have never used it before.

Wednesday, July 15, 2009

Gene ontology tool suggestions

I have used a few GO tools in the past, but after looking at the massive list of tools on the gene ontology page I'm hoping someone can give me a good suggestion for my problem.

Basically, I have several lists of GO terms (~4-15 terms per list) and I would like to see if at a "higher" branch they share a common molecular function. Ideally, a tool that could be run from the command line and outputs significance scores would be great, but a GUI tool would also work since I have about 70 lists that I would need to run.

Note, that this is slightly different than the usual over-representation analysis which usually takes a list of genes as input. In my problem I am starting with GO terms.

Any suggestions would really be welcome!

Wednesday, June 3, 2009

Syncing Mendeley and CiteULike

I have been using CiteULike for quite awhile (after switching from Connotea), but more recently started using Mendeley. Overall, I am really impressed! Mendeley is a relatively new software project (still in beta), and I am surprised by how well it works. It has some crucial features that seperate it from other bookmarking tools such as: ability to sync bookmarks and pdf files back and forth from multiple personal computers and their online server, the ability to organize pdf files locally by title, author, journal, etc., has a citation plugin for Word (so you can stop paying for EndNote), and that the client software is available for Linux! Mendeley has been working so well that I was afraid I might end up abandoning CiteULike, since I most likely won't bookmark something twice.

However, yesterday it was announced that bookmarks from CiteULike can be accessed from within Mendeley. Note that this isn't just the simple ability to import the bookmarks, but that the bookmarks are kept synced and in their own CiteULike folder within Mendeley. Although the syncronization is currently only one way, from CiteULike to Mendeley, further integration of the two tools is suppossedly in the works.

This seems like a great colloboration since CiteULike tends to focus more on the social networking aspect, while Mendeley focuses more on providing a presonal reference manager.

It is nice to see companies colloborating instead of competing.

Wednesday, May 20, 2009

Automatically downloading emails in Thunderbird when using IMAP

Lots of applications have an "offline" feature that allow you to access data (email, calendar, documents, etc) when you don't have an internet connection. These are great, but I can never remember to click the "offline" mode. Bandwidth and storage are never usually concerns, so I would just prefer if applications did this by default (or at least had the option). Google Calendar is about the only program that I use daily that does this without me needing to click on update/offline.

For those who use Thunderbird as their email client and use IMAP instead of POP, you can set it to have all of your emails stored locally by default without clicking the offline mode. The trick is a couple of settings in the advanced config editor (Options->Advanced->Config Editor):
  1. mail.server.default.autosync_offline_stores to true (you might have to create this value if it doesn't already exist. Right Click->New->Boolean)
  2. use_status_for_biff to false

More information is here.

Wednesday, May 13, 2009

Hello California!

Well UC Davis to be more precise. I accepted a postdoctoral fellowship from Jonathan Eisen to be a part of the iSEEM project working on metagenomics. I have only been here for a few days, and first impressions seem great. First, the research field is exactly what I was most interested in; second, my previous PhD research is definitely of relevance; and third, I feel like I have lots to learn from the people around me.

Considering my previous Blog tag line/description is inaccurate:
"A PhD student's point of view on bioinformatics, evolution, and microbial diversity; with an interest in cutting edge computer tools that make them all a bit easier."

I decided to radically change it to:
"A post-doc's point of view on bioinformatics, evolution, and microbial diversity; with an interest in cutting edge computer tools that make them all a bit easier."
Jonathan's opinion on open-access publishing is quite similar to my own, so in addition to blogging about microbial evolution, expect to see more posts about my views on academic publishing.