Wednesday, April 14, 2010

An interview with the creator of BioTorrents
Who better to interview the creator of BioTorrents than the creator himself? :)

Interviewer: So Morgan, your article entitled “BioTorrents: A File Sharing Service for Scientific Data” was published today in PLoS One. BioTorrents uses the popular peer-to-peer file sharing protocol, BitTorrent, to allow scientists to rapidly share their results, datasets, and software. Where did this idea come from?

Morgan: Well about 6 months ago I was downloading some genome files from NCBI's FTP site and was watching the download speed hover between 50-100Kb/s and I said to myself (much like this interview) I wish could download these with BitTorrent. I have used BitTorrent for downloading other non-scientific data (lets not discuss what they may be) and I know it is a much faster and more reliable way for getting large files. A few minutes later I posted to Twitter asking if anyone had thought about setting up a BitTorrent tracker for scientific data and the response was over-whelming (well only 1 response, but I could feel it had a larger impact). About a week later, I brought up the idea again over coffee with some members of my lab and more importantly my post-doc supervisor Dr. Jonathan Eisen. He thought it was a good idea and well worth pursuing, which was all I needed to push aside all my other "real" research and focus on this much more "fun" project.

Interviewer: Thanks for that long-winded response. Maybe you could comment more briefly on the benefits of using BioTorrents/BitTorrent for sharing scientific data.

Morgan: I think it is explained fairly well in the manuscript and in my previous blog post, but to reiterate the major benefits are:
1) Faster, more reliable, and better controlled downloading of data that scales well for very large files.
2) Instant "publishing" of data, results, and software.
3) Very easy for anyone to share their data. No dedicated web server needed.

Interviewer: Who should consider sharing data on BioTorrents?

Morgan: Everyone that has something to share. Large institutions can benefit from reduced bandwidth requirements, while individual users can benefit from the simplicity of sharing with BitTorrent technology. Personally, I really like the idea of open data and the idea of sharing results before publication. How many times has someone done an all vs all blast of microbial genomes? In theory this can be done once, and that person can be recognized (referenced, co-authored, etc.) when other researchers use that data.

Interviewer: Are there any challenges/limitations to using BitTorrent with scientific data?

Morgan: BitTorrent excels at transferring very large popular datasets. Therefore, if only one person is "seeding" a file and only one person is downloading the file most of the advantage to using BitTorrent is lost. However, even in this worst case scenario, the transfer speed would be roughly equivalent to using traditional file transfer methods such as FTP/HTTP and BitTorrent still provides the benefit of error checking and ease of data transfer control (pause, resume, etc.). Another possible problem is that some institutions often try to limit BitTorrent traffic since it is often considered illegal non-work related network traffic. However, I would encourage users at these institutions to explain to their network administrator that many times BitTorrent traffic is legitimate and shouldn't be blocked.

Interviewer: Why publish in PLoS One?

Morgan: I have been a big fan of the PLoS One journal and ever since I blogged about it last year "Is PLOS One the future of scientific publishing?", I have been wanting to submit a paper there. Also, considering that BioTorrents is aimed at improving open access to data in all fields of science, PLoS One seemed like the most obvious journal choice for our manuscript.

Langille, M., & Eisen, J. (2010). BioTorrents: A File Sharing Service for Scientific Data PLoS ONE, 5 (4) DOI: 10.1371/journal.pone.0010071


IPS said...

Great Interview -- but the interviewer seemed sort of biased in your favor ;-). Nice idea w/BioTorrents (and another publication! chi-ching).

matushiq said...

interesting idea!

Neuroskeptic said...

It's a great idea, I hope people start using it. In particular I can see this being useful in fMRI neuroimaging where the data is huge (up to a gigabyte per subject raw data, more if it's been analysed, and up to 30 subjects per study).

MarkvP said...

That is a really nice idea/paper/interview!

well done.
well done indeed.

