Tuesday, May 15, 2007

Bioinformatics for biologists

I just read the review "Bioinformatics Software for Biologists in the Genomics Era" by Kumar and Dudley. Basically, the authors outline the need for improved bioinformatics software that can be easily used by biologists. I completely agree with the authors, but by the time I was done reading the article I was somewhat annoyed. I think the overall problem I have is that they just keep reiterating that the people developing the tools need to make them more user friendly, without giving any real review of possible solutions or roadblocks that need to be overcome.
They state that:
  1. command-line programs are bad, GUI's are good
  2. being able to submit batch processes is good
  3. documentation is good
  4. clear, human readable results is good
  5. tools that run on all operating systems is good
  6. being able to connect multiple tools in a pipeline (with no programming required) is good

I think even the most amateur programmer is aware of these issues, so I am left wondering who the audience of the article is meant for? In addition they don't reference any of the current tools that are being developed to improve on the situation. In particular they propose:
"Within the context of the user-friendly software, we favor a solution where the existing implementations of computational methods can be incorporated “as is,” without requiring any significant effort from the developer of the program that is being incorporated. We refer to this approach as “Application Linking,” which is similar to “wrapping” (Spitznagel and Garlan, 2003). The aim of Application Linking is to allow existing user-friendly applications to seamlessly host third-party scripts and applications through its graphical interface, such that the user is abstracted from the intricate nuances of the hosted application’s non-visual execution requirements (e.g., process control, system I/O, and control files)."
Ummm.... I guess they haven't heard of Taverna and web services?

As a student who is writing one of these programs let me elaborate on what I think is the main problem. Scientific credit is based on publishing not on producing good tools. What does this mean to me as a PhD student? It means that the time spent on producing a robust web tool would be better spent on making additional tools or conducting more biological relevant based analyzes that will lead to more publishable papers. Am I proud of this? No. Especially since I have a strong interest in reducing programming redundancy, but for now it seems that I don't have much of a choice.

1 comment:

Pedro Beltrao said...

The same discussion applies to databases. After having published a paper on a database what is the incentive of keeping it up-to-date or to add a new feature ? Not only will a second publication on the same database will be harder to get but also the funding to sustain the research. There should be alternative funding options for maintaining and improving useful tools and databases.