Wednesday, October 21, 2009

BioTorrents - a file sharing resource for scientists

Let me ask you a question. If you just wrote a new computer program or produced a large dataset, and you wanted to openly share it with the research community, how would you do that?

This is a diagram of a Wikipedia:Peer-to-Peer ...Image via Wikipedia


My answer to that question is BioTorrents!

This has been a side project that I have been working on lately and considering this is the first international Open Access Week I thought I should finally announce it.

BioTorrents is a website that allows open access sharing of scientific data. It uses the popular BitTorrent peer-to-peer file sharing technology to allow rapid file transferring.

So what is the advantage of using BioTorrents?

  1. Faster file transfer

    • Have you tried to download the entire RefSeq or GEO datasets from NCBI recently? How about all the metagenomic data from CAMERA? Datasets continue to increase in size and downloading speed can be improved by allowing multiple computers/institutions to share their bandwidth.


  2. More reliable file transfer

    • BitTorrent technology has file checking built-in, so that you don't have to worry about corrupt downloads.

    • Decentralization of the data ensures that if one server is disabled, that the data is still available from another user.


  3. A central repository for software and datasets


    • Rapid and open sharing of scientific findings continues to push for changes in traditional publication methods and has resulted in an increase in the use of pre-print archives, blogs, etc. However, sharing just datasets and software without a manuscript as an index is not as easy. BioTorrents allows anyone to share their data without a restriction on size (since the files are not actually hosted or transferred by BioTorrents).

    • Titles and descriptions of all data on BioTorrents can be browsed by category or searched for keywords (tag cloud coming soon).

    • As long as there is at least one user sharing the data it will always be available on BioTorrents. Those pieces of software or datasets that are not popular and not hosted by a user will quietly die (removed from Biotorrents after 2 weeks).

I am continuing to update BioTorrents, so if you have any suggestions or comments please let me know.


13 comments:

Egon Willighagen said...

I think this is a great idea, and have been pondering about it myself for a long time. Now, Open Data in chemistry is quite rare, so I had never followed up on the idea.

So, practically... how bio does a data set have to be to be allowed on your website? A copy of PubChem, would that be OK? Or CrystalEye (crystal structures or small molecules)?

Morgan Langille said...

Egon,

Anything science related is welcome on BioTorrents. I considered calling it ScienceTorrents or something more general but since biology is what I know best, I called it BioTorrents. I can make a couple of extra general categories such as physics and chemistry so that it is more obvious that they are welcome.

tharris said...

Morgan - Great idea!

I think for this to really succeed the central server needs to act not just as a signpost to available torrents, but as a seed of at least some torrents.

For a time, I was seeding large datasets for projects I work on as a distribution mechanism for collaborators. But with small numbers of interested users, most are leechers downloading directly from the source defeating the entire purpose of the exercise.

Common datasets like NR might be seeded by many folks and need no assistance. But it would be great to give datasets with smaller user communities a leg up by acting as a seed.

Morgan Langille said...

Todd,

Yes, I think I will have to act as a seed for many of the torrents. Hopefully the community will grow enough that others with servers will help seed many of the torrents as well.
There is an RSS feed so I am considering just automatically downloading all torrents so that I can help seed them.

I have set up our own server so that it is very easy to upload and seed things on BioTorrents. Basically, the user has to just move their data into a particular directory and run a script (available here).

Ideally, it would great to get some of the larger organizations (e.g. NCBI, EBI, JGI, CAMERA, etc) to also host their data via BioTorrents in addition to FTP.

Benjamin Good said...

Love it. Concur with comments regarding need for seeds. Suggest talking with Francis Ouellette about NCBI and other data. Also suggest trying to present it at ISMB - perhaps you will find your seeders there.

Mike Chelen said...

BitTorrent is a great way to move these kind of large files, on behalf of everyone interested in Biology, thanks for setting up this resource! :)

Concur about RSS, it is an ideal method that anyone can use to automatically help mirror and seed the files.

The end result is very fast transfers, with advanced features such as error correction and pause/resume, available to experienced and novice users alike.

Very excited to see all the progress being made here!

Todd Harris said...

Maybe BioTorrent should establish ultrapeers to ensure that even small projects or files in less demand always have seeders...

I'm sure that the readers of this thread have access to substantive storage and bandwidth...

Mike Chelen said...

With RSSDler or a BitTorrent client with integrated enclosure downloads it is possible to automate download and seeding of all files published on BioTorrents. Then if a contributor doesn't have a lot of bandwidth, they can disconnect after uploading only 1 copy of the file set. Even if the other seeds aren't as fast, then there will be at least a single copy of the file available, and other seeders can hop on if they have extra bw or the file is particularly interesting.

Pearlie Guerrier said...

I also think that is a great idea. I especially like the part about the file checking built-in, so that you don't have to worry about corrupt downloads. These are common issues and it's a good solution.

kevin.anchi said...

Anxiety. It’s something that we all experience from time to time. Usually it’s a healthy response, a normal bodily reaction to stress. But for some of us anxiety becomes a way of life, a never-ending cycle of fear. One fear begets another fear begets another fear and it continues in a vicious circle, wearing us down, making us feel unable to cope or exist in a “normal” way.
Linden Method

back office software said...

I appreciate your post, thanks for sharing the post, i would like to hear more about this in future
inventory software

Tippu Wani said...

Free File Hosting , Upload and Share you files securely with FileLoby.

Blogger said...

Bluehost is the best hosting provider with plans for any hosting needs.