Wednesday, October 3, 2007

Least Publishable Unit (LPU)

I have been recently thinking about the Least Publishable Unit (LPU) theory in academia. Considering that I am now a month into my fourth year of my PhD and I have just submitted my first, first author research paper on my thesis work I am starting to panic slightly. I do have a previous first author research paper from undergrad research, 3 other non-first author papers, a submitted first author book chapter, and a Nature Microbial Reviews paper soon to be submitted. However, I would like to have another couple of first author papers in the next year and a half, so that I can graduate with a decent PhD career under my belt.

From my previous experience, the life sciences tend to publish more content less often, whereas computer scientists tend to publish very often with smaller amounts of research. Bioinformatics has overlap in both of these fields thus allowing different publishing rates depending on your research topic. For instance, if you are developing new tools, you would probably be publishing at a greater rate then if you are using bioinformatics to find some new biological interesting result (although this is certainly not always the case).

I would like to think that I have been focusing more on biology and thus my publishing has been slightly behind. However, I now have the skills and knowledge that I could quickly crank out a couple of useful tools that would probably be publishable (I feel like this would be somehow selling out, but maybe not).
Also, if I did go this route does it depend on how much effort was involved or rather how useful the tool would really be?

Recently, I wrote a script that would use gene synteny to make improved ortholog detection in two genomes. It is not overly complicated and uses previously developed tools (genome alignment and local alignment tools), but I think it is incredibly useful and improves upon the basic reciprocal best blast hit approach that is primarily in use. Although, my research is not focused on ortholog prediction and the tool was made so that I would not have to manually annotate 5500 bacteria genes (as part of a bacteria genome project); I have to wonder, "is it publishable?". I guess the only way to find out is by submission.


Benjamin Good said...

Ahh yes, the LPU, I know it well..
I think the short answer to your post is that, yes, you should probably go ahead and crank out as many LPUs as you can because it will impress the first people to process your various requests for money - adminstrators who don't really know what you are doing. While this is, of course, important, I wouldn't fret too much about if you are truly bound for academic glory. When you apply for an academic position, you will actually be reviewed by people who do know what the heck you are doing and that understand that sometimes one paper that takes 2 years of work to produce is actually more important than 6 done in one month bursts.

Now for the long answer :) seeing as how I'm in the mood...

The LPU can be turned from its current, rather silly and rather scientifically disheartening state to one that is actually beneficial to the progress of science. I think, especially in bioinformatics, the mode of publication of things like your syntenic script is headed for a very important and fundamental change. Models for various disciplines will vary, so I'll elaborate only on the bioinformatics 'application note' species - though I think similar models will crop up all over the place. I think the LPU will and should be transmogrified from a manuscript for reading to a functional unit ready for use. Thats right, I'm talking about workflows and I'm talking about LPUs as components or workflows. Every time a new script or algorithm or database or whatever computational unit you like is 'pubished' it should be mandatory that a working implementation suitable for sensible incorporation with related data and algorithms be deposited in a repository such as MyExperiment (password = 'dolphin'). Publishing such workflow components should count as publishing 'papers' even without an extensive write-up and thus the LPU will change. Rather than contributing to ever greater piles records in PubMed that serve very little actual purpose - aside from keeping the NLP people in business, LPUs will contribute to the global, collaborative synthesis of a single, unified platform for bioinformatics. LPUs may even get smaller as a result. So-called shim services that simply convert one format or identifier to another are actually extremely valuable despite their apparently diminutive nature.

Jan Aerts said...


You're quite right in saying that we should publish useful stuff such as workflows instead of just the text in papers. Promising as MyExperiment can be to do that, I see an issue that it's not all for everyone. Many people (like myself) do a lot of scripting in perl/python/ruby, rather than trying to find and link together the webservices to do the same thing, which would just be overly complicated and time-consuming. Of course, it is possible to incorporate little scripts in Taverna, but these have to be in java-speak.
So in the end, most of the bioinformatics research as described here (i.e. parsing, scripting, ...) is overly difficult with Taverna. As soon as it's possible to just plug in your own home-made scripts into Taverna, it's adoption will soar (and therefore MyExperiments' as well).

Benjamin Good said...


Ya, I agree that its a pain in the but to use Taverna for many tasks right now. I must confess that I've given up an several occassions and fallen back on one time use hacks.. But they are really working to change that. I just participated in an experiment on workflow construction/re-use run by Antoon Goderis that seems to be designed to provide the motivation for a massive overhaul of the user interface.

As the interface improves and more and more working workflows are added to the repository I suspect (and hope!) that it will end up being easier and more reliable to find/assemble a workflow for most tasks than it would be to write a new script from scratch. Thats the idea anyway.

Until then, producing and publishing a workflow in the repository should be seen as an act for the public - rather than the private good. Much like...

Bill Hooker said...

I like the way Benjamin thinks, and I think his model will map quite well to "wet bench" work as well. But since we're not there (publishing workflows) yet, you could think about BMC Research Notes (disclaimer: I'm an editorial associate) or BMC Bioinformatics to publish that script.