Wednesday, October 21, 2009

BioTorrents - a file sharing resource for scientists

Let me ask you a question. If you just wrote a new computer program or produced a large dataset, and you wanted to openly share it with the research community, how would you do that?

This is a diagram of a Wikipedia:Peer-to-Peer ...Image via Wikipedia

My answer to that question is BioTorrents!

This has been a side project that I have been working on lately and considering this is the first international Open Access Week I thought I should finally announce it.

BioTorrents is a website that allows open access sharing of scientific data. It uses the popular BitTorrent peer-to-peer file sharing technology to allow rapid file transferring.

So what is the advantage of using BioTorrents?

  1. Faster file transfer

    • Have you tried to download the entire RefSeq or GEO datasets from NCBI recently? How about all the metagenomic data from CAMERA? Datasets continue to increase in size and downloading speed can be improved by allowing multiple computers/institutions to share their bandwidth.

  2. More reliable file transfer

    • BitTorrent technology has file checking built-in, so that you don't have to worry about corrupt downloads.

    • Decentralization of the data ensures that if one server is disabled, that the data is still available from another user.

  3. A central repository for software and datasets

    • Rapid and open sharing of scientific findings continues to push for changes in traditional publication methods and has resulted in an increase in the use of pre-print archives, blogs, etc. However, sharing just datasets and software without a manuscript as an index is not as easy. BioTorrents allows anyone to share their data without a restriction on size (since the files are not actually hosted or transferred by BioTorrents).

    • Titles and descriptions of all data on BioTorrents can be browsed by category or searched for keywords (tag cloud coming soon).

    • As long as there is at least one user sharing the data it will always be available on BioTorrents. Those pieces of software or datasets that are not popular and not hosted by a user will quietly die (removed from Biotorrents after 2 weeks).

I am continuing to update BioTorrents, so if you have any suggestions or comments please let me know.

Wednesday, October 7, 2009

Canada gets Google StreetView

Google launched their StreetView program in major Canadian cities today. Of course I don't live there right now, but I did check out my old stomping grounds in Vancouver, BC.

View Larger Map

Also, they happened to do Chester, NS which is where I usually spend most of my summer vacations.

View Larger Map
Reblog this post [with Zemanta]

Sunday, August 2, 2009

Most of my programming is what I like to call "biologically driven"; that is the main end result is not the development of the program itself, but rather the data that comes out of the program. Many times this involves writing a script to input data, do something to that data, and then output it back to a file which is in turn read into another infinitum.

The classic tab-delimited file is usually my typical choice for the intermediate format, but reading and writing (although simple) these gets repetitive and more complicated for more complex data structures. I finally looked into alternatives (something I clearly should have done awhile ago) and came across Storable.

Basically, it allows you to save/open any perl data structure to/from a file.
It is very easy to use:
use Storable;

#Reference to any data structure

store($data_ref, 'my_storage_file');

#later in same or different script
$new_data_ref = retrieve('my_storage_file');
Check it out if you have never used it before.

Wednesday, July 15, 2009

Gene ontology tool suggestions

I have used a few GO tools in the past, but after looking at the massive list of tools on the gene ontology page I'm hoping someone can give me a good suggestion for my problem.

Basically, I have several lists of GO terms (~4-15 terms per list) and I would like to see if at a "higher" branch they share a common molecular function. Ideally, a tool that could be run from the command line and outputs significance scores would be great, but a GUI tool would also work since I have about 70 lists that I would need to run.

Note, that this is slightly different than the usual over-representation analysis which usually takes a list of genes as input. In my problem I am starting with GO terms.

Any suggestions would really be welcome!

Wednesday, June 3, 2009

Syncing Mendeley and CiteULike

I have been using CiteULike for quite awhile (after switching from Connotea), but more recently started using Mendeley. Overall, I am really impressed! Mendeley is a relatively new software project (still in beta), and I am surprised by how well it works. It has some crucial features that seperate it from other bookmarking tools such as: ability to sync bookmarks and pdf files back and forth from multiple personal computers and their online server, the ability to organize pdf files locally by title, author, journal, etc., has a citation plugin for Word (so you can stop paying for EndNote), and that the client software is available for Linux! Mendeley has been working so well that I was afraid I might end up abandoning CiteULike, since I most likely won't bookmark something twice.

However, yesterday it was announced that bookmarks from CiteULike can be accessed from within Mendeley. Note that this isn't just the simple ability to import the bookmarks, but that the bookmarks are kept synced and in their own CiteULike folder within Mendeley. Although the syncronization is currently only one way, from CiteULike to Mendeley, further integration of the two tools is suppossedly in the works.

This seems like a great colloboration since CiteULike tends to focus more on the social networking aspect, while Mendeley focuses more on providing a presonal reference manager.

It is nice to see companies colloborating instead of competing.

Wednesday, May 20, 2009

Automatically downloading emails in Thunderbird when using IMAP

Lots of applications have an "offline" feature that allow you to access data (email, calendar, documents, etc) when you don't have an internet connection. These are great, but I can never remember to click the "offline" mode. Bandwidth and storage are never usually concerns, so I would just prefer if applications did this by default (or at least had the option). Google Calendar is about the only program that I use daily that does this without me needing to click on update/offline.

For those who use Thunderbird as their email client and use IMAP instead of POP, you can set it to have all of your emails stored locally by default without clicking the offline mode. The trick is a couple of settings in the advanced config editor (Options->Advanced->Config Editor):
  1. mail.server.default.autosync_offline_stores to true (you might have to create this value if it doesn't already exist. Right Click->New->Boolean)
  2. use_status_for_biff to false

More information is here.

Wednesday, May 13, 2009

Hello California!

Well UC Davis to be more precise. I accepted a postdoctoral fellowship from Jonathan Eisen to be a part of the iSEEM project working on metagenomics. I have only been here for a few days, and first impressions seem great. First, the research field is exactly what I was most interested in; second, my previous PhD research is definitely of relevance; and third, I feel like I have lots to learn from the people around me.

Considering my previous Blog tag line/description is inaccurate:
"A PhD student's point of view on bioinformatics, evolution, and microbial diversity; with an interest in cutting edge computer tools that make them all a bit easier."

I decided to radically change it to:
"A post-doc's point of view on bioinformatics, evolution, and microbial diversity; with an interest in cutting edge computer tools that make them all a bit easier."
Jonathan's opinion on open-access publishing is quite similar to my own, so in addition to blogging about microbial evolution, expect to see more posts about my views on academic publishing.

Thursday, April 30, 2009

Goodbye Vancouver!

The past 4 months have been a whirlwind. On April 16th I successfully defended my PhD thesis, after some minor revisions submitted it on April 18th, and left the country on April 29th. I wouldn't recommend such a tight time line especially if you happen to have a 5 month old baby as well!

My thesis will eventually be accessible (open-access of course) through SFU's library, but for those who are just dying to read it now, can access it here (+ appendix).

I feel obligated to give some type of advice to future PhD students. Unfortunately, I don't have any huge insight, but I would recommend not worrying too much during your graduate studies. Many times, I thought the whole thing would unravel and I would never finish, especially during years 2-3, but all of a sudden things started to fall in place. Every grad student I have ever talked to has always agreed that productivity increases greatly in the last year or two and so you can't worry about how long it took to do X in time Y. I hope I am not giving the impression that doing a PhD is easy, because it is not. It is hard, and different from all other schooling. If you think of an undergrad degree as sprinting, then a PhD is more like a marathon. I was great at sprinting, but learning to be a good marathon runner was a completely new set of skills.

In between all of the moving steps (I don't want to see another cardboard box for quite awhile), I had lots of time to reflect on my past 4.5 years in Vancouver, BC. Although there were some challenging times, I will greatly miss Vancouver and the people that I met during my time there. The first years of my marriage, living far away from family, the completion of my PhD, and becoming a Dad all happened in Vancouver and I will cherish the multitude of memories that accompany each of these milestones.

To end this post, I think I will list a few flashes of memories that are ingrained in my head from the past several years (in no particular order):
  • Driving across Canada and seeing the Rockies from a distance for the first time.

  • Looking out my first downtown apartment window for the first time.

  • Standing on top of the "Chief".

  • Snorkeling in the ocean with my wife along the "sunshine coast".

  • Houseboating on a quiet lake in Vancouver Island surrounded by the most beautiful scenery.

  • White water rafting near Squamish.

  • Walking the sea wall countless times, and every time still being impressed by it

  • The various camping adventures including a jump into a cold lake to escape a never ending swarm of flies.
  • Standing at the peak of Whistler for the first time.
  • The various conferences that included travel to destinations such as Maui, Vienna, Cambridge, UK, and California.
  • The birth of my son, Gavin.
  • The happiness of reading a short letter stating that I had completed all requirements for my PhD.

Tuesday, March 31, 2009

Is PLOS One the future of scientific publishing?

I just read about PLOS One's new features through their relatively new blog, EveryOne. Although the new features are not really ground breaking they do provide a much improved layout and a new "Related Content" page. These changes show that One is dedicated to improving connectivity between peer-reviewed papers and commentary from comments, blogs, etc., giving me some hope that publishing may be changing (yet still at a snails pace).

So back to the question that is asked in the title of this post, "Is PLOS One the future of scientific publishing?", I am going to have to say a tentative "Yes". I think their basis of publishing papers not on novelty, but focusing peer-review on ensuring that the methods, and conclusions drawn from the results are scientifically sound, opens many doors for how scientists publish their findings. Currently, scientists compete for a limited space in a "high-impact" journal. In the majority of cases papers are not rejected because of their methods, results, and conclusions are not valid, but due to a better paper being submitted at the same time. This competition is justified, but in this current format has various drawbacks including:
  1. Importance of research is determined by a very small number of reviewers and usually a single editor has the final decision
  2. Significance or novelty of research is very subjective and can vary widely between reviewers
  3. Significance can change over time as future experiments confirm or depend on the results of the current research (including negative results)
  4. Not making the cut (i.e. rejection) results in a large waste of time as authors have to reformat, resubmit, and respond to new reviewers comments
The separation of the evaluation for competitiveness, novelty, significance, etc. versus scientific robustness helps reduce many of these problems. The largest hurdle to overcome using this model is to move from a journal impact factor to a paper impact factor measurement. Therefore, "signficant" papers are still valued and reconizable in PLOS One and other journals that will likely follow their publishing methods.

Personally, I have never published in PLOS One and by no means do I think PLOS One in its current form is the pinnacle of publishing. However, I do appreciate that they are trying to change the way science publishing is currently conducted.

Tuesday, March 10, 2009

Google Calendar Available Offline

I am just starting to peek my head out of the thesis hole and noticed that Google Calendar is now available offline using Google Gears. By default it only syncs your personal calendar, but shared calendars can also be synced under the offline options.

I'm not offline that often, but it is nice to know that my calendar is always available now.

Considering Gears has been around for quite awhile now, I am surprised that it took Google this long to add the offline mode for their calendar.

Thursday, February 19, 2009

HubMed Citation Manager

I just came across HubMed yesterday and I found one of their tools incredibly useful for getting references into EndNote (or other reference manager software tools). Basically, HubMed Citation Finder will take a bibliography (say from one of your favorite papers), split them up, find the citation in PubMed, and return the list of references in several citation formats such as RIS, BibTex, RDF, etc. This file is then easily imported into your reference manager's library.

It just saved me a couple of hours and would have saved me even more if I had known about it a few weeks ago.

Tuesday, February 3, 2009

Personal Genomics & The Burden of Knowing

Like many bioinformatists, biologists, scientists, and technologists I am very interested in personal genomics. I have kept track of the start ups that are doing personal SNPs analysis and have been eagerily waiting for sequencing costs to drop to the point were the $1000 genome is possible. I envisage everyone having their personal genome done and programs to analyse the data being so widespread that even a "My Genome Facts" Facebook application would not seem out of place.

Of course I have read lots about ethical worries about how the data could be mis-used or how the public can not handle the probabalities of having a certain disease. Personally, I have always thought these were blown a bit out of proportion and that personal genomics will in general be a good thing. More data is better right?

Well, I just read an article called "The Burden of Knowing" by Catherine Elton in the Boston Magazine and it really made me reconsider my previous thoughts. Elton starts out explaining about personal genomics and specifically about Knome, the first company to do complete personal genome sequencing. She then starts to delve into her personal choices regarding her susectibility to having the BRCA1 gene. The article is extremely well written, and unless I am becoming a complete softy, quite sobering.

A small excerpt that I really enjoyed was this:
The counselors then mentioned another option: having my ovaries taken out and my breasts removed. Here we were, talking about science's ability to look along a submicroscopic piece of DNA, searching for missing letters on a strip of a gene, and yet if science found that letters were missing—if the gene had the cancer-risk mutation—the best it could do was amputate or sterilize. These options seemed as though they should have been filed away in a medieval remedy book, somewhere between leeches and bloodletting.

So did the story change my view on personal genomics? No not completely, but I do think that getting my genome sequenced might not be as fun as I first thought. Too bad there are not many positive attributes linked to genes like "gene variant Y will allow you to live a long life despite your lack of physical exercise" or "you have an improved version of the alcohol dehydrogenase gene, so feel free to drink more beer".

Tuesday, January 27, 2009

Pseudomonas and Langille in the media

Ok, this is some serious self-promotion, but scientists (well PhD students anyway) don't get a chance to brag about their research being in the media very often. Plus, it is my blog, so why not?!

The actual science:
The research in question surrounded the sequencing of the Liverpool Epidemic Strain of Pseudomonas aeruginosa that was causing increased virulence in cystic fibrosis patients. One of the interesting things in the paper is that we identified several genes related to virulence (using STM) and that several of these genes were within genomic island and prophage regions. Of course virulence factors have been found within these types of regions before, but to have actual in-vivo (chronic rat lung infection model) experimental evidence that these genes are involved in virulence in an epidemic strain, really makes this research notable. The research was published in Genome Research and is open access.

The media coverage:
Lancet Infectious Diseases (sorry not OA).

Vancouver Sun

Ok, now for the fun stuff:
SFU News - Notice those sleepy eyes? That is what having a 2 month old will do to you!

The story even made some news on a non-English site:
Automatic translation results in me being referred to as "blue Gull", SFU as "West gate Philippines Sand University", and UBC as "Inferior poem University".

Wednesday, January 21, 2009

Looking for a bioinformatics expert?

What I have to offer:
  • A balanced background in both biology (BSc) and computer science (BCS)
  • Soon to be completed PhD
  • Extensive research experience in bioinformatics, genomics, phylogenetics/phylogenomics, evolution, and bacteria pathogenesis
  • Some previous research experience in medical imaging, ontology development, and metagenomics
  • An impressive publishing record (7 papers, 3 first authors, 2 more first authors under review)
  • Solid computational skills including Perl programming, database design (MySQL), parallel programming, and web design (PHP & JavaScript)
  • Good communication and social skills
  • More information
What I am looking for:
  • Post-doc or job (academic or industrial)
  • Preferably, a position where I have some significant manager or leadership responsibilities
  • Geographically interested in north eastern parts of North America (Ottawa down to New York), but would entertain positions elsewhere in N.A.
I didn't put any limitations on research interests, since I am open to many areas. However, anything having to due with the human microbiome project, human-bacteria interactions, or metagenomics would be of particular interest.

Please email me if you are interested or if you have suggestions on some good openings.

Wednesday, January 7, 2009

Airports & Interviews

Quick question: Why do people line up or huddle around an airport gate before they are called to board? The plane is not leaving until everyone has boarded, so why would you want to sit even longer in a cramped up airplane than is absolutely necessary?

I contemplated these questions and other airport mysteries (like who would pay a $38 service fee for changing Canadian to US currency) while recently sitting in airports for 7 hours and 5 hours on two separate trips.

Fortunately, both flights were worth while since one was a flight home where home made meals and warm fireplaces greeted my new family of 3 and the second was for a job that I am quite interested in at Boston. Luckily, another interview I had arranged did not require a flight so my sanity was slightly saved.

I would say that both interviews went fairly well, and overall I actually enjoyed the experience. Doing a PhD (or any large project I suppose), you tend to lose sight of the accomplishments that you have made along the way. Getting a chance to present my work to an audience that is genuinely interested (not just lab mates that have to be in attendance) does not happen that often and even though it can be a bit stressful, I usually find it rewarding.