Beyond Lab: tips for bioinformatics

Thursday, February 19, 2009

tips for bioinformatics

This was from a talk by Joel Dudley, originally posted by Shirley Wu at http://shirleywho.wordpress.com/2009/02/11/tips-and-tricks-for-software-engineering-in-bioinformatics-talk-by-joel-dudley/.

Quite useful if you want to start programming something...

1. Learn UNIX. It’s quick, it’s powerful, it’s easy to learn. What often takes several lines to code in a scripting language can usually be reduced to a single line on the command line.

2. Be jack of all trades, but master of ONE. That is, be familiar with most programming languages, but be really good at one of them. In the hierarchy of languages, VB and C are more “primitive” while Ruby and Python are most “advanced” - he recommends starting with one of the more advanced languages if you are new to programming. Out of Ruby and Python, Python will probably give you more bang for your buck, due to the smorgasbord of libraries available and broad acceptance (e.g. academic labs, Google). In addition, there are lots of bridges between languages, such as Jython (Java and Python) and JRuby (Java and Ruby) so expert knowledge of one is usually sufficient for you to make a lot of things work practically everywhere.

3. Don’t reinvent the wheel. “Frameworks are your friends.” Take advantage of large existing projects like BioPython/Perl/Ruby/Java, Django, Rails, etc which contain lots of ready to go code for practically everything. Use the internet to find existing code solutions - e.g. Koders is like a Google search for open source code on the web.

4. Learn one text editor really well. Take your pick of Emacs, vi, or a GUI-based editor like TextMate for Macs. The advantage of emacs and vi is that they will be installed on pretty much any system you come across.

5. “Don’t trust yourself”, i.e. use code versioning. Examples are Subversion, CVS, and git. You can even outsource your code hosting with github. Combine this with project management in GForge.

6. Don’t be afraid to use more than 3 letters to define a variable. Having short variable names won’t make the code run faster. It will, however, make the code more difficult for others (and you, 3 months from now) to understand!
Photo by archeon on Flickr

7. Balance architecture and accomplishment. You may be tempted to create something that is complete, elegant, and perfectly structured. This will likely be a waste of time. It’s ok to sacrifice a little bit of structure to get something that actually works.

8. Automate documentation. Documentation is necessary, but it’s a pain to write. So come up with a convention for your headers and make it automatic. Use available tools like Doxygen, JavaDoc, and RDoc, many of which are free.

The above are generic for academic-level software engineering. Some tips that more specifically address high-throughput biomedical computing:

9. Kill the flat file (sort of). This is the most common file format used in bioinformatics, but it hardly lends itself efficient computation. A common task we want to do with the file is read in the data and store it keyed so that we can look up specific pieces of the data later. Hate databases? Cringe at SQL? If you can represent your data as key/value pairs, consider using an embeddable database like the open source BerkeleyDB (now licensed by Oracle), which require no administration. If you don’t mind SQL, but hate the administration, SQLite allows you to create embedded, serverless databases. Other options that go beyond the relational database concept are CouchDB (”a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API”) and Hypertable (”a high performance distributed data storage system”).

10. New ways to do parallel computing. Determine whether your tasks are loosely coupled (independent) or tightly coupled. Although personal computers and laptops are coming out with more cores, most programs only use one at a time. Find ways to utilize idle cores - e.g. there is a way to do this in R. Think in terms of MapReduce. Take advantage of cloud computing, like Amazon’s EC2. Use platforms like Hadoop and Disco to make parallel computing applications. A cool example of this is Cloudburst-Bio, a massively parallel project for genome assembly from next-generation sequencing that uses MapReduce.

11. Embrace hardware. New (and old) hardware is available that can give you significant speedups in biomedical computation, notably graphical processing units (GPUs) which have been used to accelerate molecular dynamics. Hardware vendors like Nvidia are starting to respond; you can now get GPU workstations like NVidia’s Tesla personal supercomputer offering many 100sX speedup over traditional workstations. So if you don’t want to utilize the cloud, you can get an affordable and powerful cluster that fits on top of your desk. Aside from GPUs, there are field programmable gate arrays - chips you can program after manufacturing.

12. Playing nice with others. Think a bit about data exchange formats - but definitely use them! Suggestions are JSON, YAML, and, of course, XML. When working in teams, use an “agile software development” strategy - mainly many fast iterations of the specification-development-feedback cycle. Use tools to automate the development process, such as unit testing and the granddaddy, “make“. Tools like BaseCamp (and perhaps Science 2.0 versions like Laboratree) can help with the more general project management aspects.

————————————————-

In summary:

Focus on the goal (biology or medicine).
Don’t be clever (you’ll trick yourself).
Value your time.
Outsource everything but genius.
Use tools available to you.
And have fun. ;)

Slides for Joel’s presentation are up on Slideshare http://www.slideshare.net/jtdudley/tips-and-tricks-for-bioinformatics-software-engineering.

by Beyond Lab

17 comments:

Anonymous said...: hello... hapi blogging... have a nice day! just visiting here....; May 5, 2009 at 9:30:00 PM EST
Anonymous said...: lovely, i just added many fresh emo backgrounds on my blog
http://www.emo-backgrounds.info; February 6, 2010 at 2:53:00 AM EST
Anonymous said...: порно одноклассники без регистрации :-(

порно одноклассники смотреть онлайн 538746
325 видео порно одноклассники
видео порно одноклассники 4519; November 13, 2010 at 10:15:00 AM EST
Anonymous said...: odnotraxniki -)

odnotraxniki 346821
457 odnotrahniki
odnotraxniki ru 1673; November 14, 2010 at 2:25:00 AM EST
Anonymous said...: www odnotraxniki )

odnotrahniki 463572
164 odnolubovniki net
www odnolubovniki ru 3815; November 14, 2010 at 4:01:00 AM EST
Anonymous said...: 25819 Скачать Отправь их в ад, Мэлоун! %PP; November 14, 2010 at 6:29:00 AM EST
Anonymous said...: 15238 Скачать Светлячки в саду :(; November 14, 2010 at 6:53:00 AM EST
Anonymous said...: 48291 Скачать Идеальный побег ;-); November 14, 2010 at 3:11:00 PM EST
Anonymous said...: код активации эроклассники %PP

код активации эроклассники 286715
476 эроклассники com
www эроклассники ru 8649; November 16, 2010 at 3:26:00 PM EST
Anonymous said...: www эроклассники :-(

сайт эроклассники 571963
461 эроклассники +без регистрации
эроклассники отзывы 3972; November 16, 2010 at 4:10:00 PM EST
Anonymous said...: 83914 разработка егэ -); November 19, 2010 at 1:41:00 PM EST
Anonymous said...: 84126 формулы по физике для егэ ); November 19, 2010 at 1:47:00 PM EST
John Mitchel said...: Thank you for really interesting source.; November 29, 2010 at 5:52:00 AM EST
Anonymous said...: ugg protection spray ugg company store ugg boots woman http://www.fdidboots.co.uk [url=http://www.ukgetboots.co.uk][img]http://www.ukgetboots.co.uk/images/ugg-boots.jpg alt="ugg boots uk" title="cheap ugg boots"[/img][/url]
[url=http://www.vtallboots.org.uk/#h4r80h46d][b]sale ugg[/b][/url]
[url=http://www.bootssecher.co.uk/#h8h59c17u][b]ugg boots cheap[/b][/url]
[url=http://www.fdidboots.co.uk/#j0o50r90a][b]ugg online[/b][/url]; November 15, 2012 at 10:27:00 AM EST
Anonymous said...: [url=http://www.hommelv2013.com]Louis Vuitton Soldes[/url]Louis Vuitton Homme[/url]Louis Vuitton SacL[url=http://www.hommelv2013.com]Louis Vuitton Soldes[/url]; March 21, 2013 at 10:45:00 PM EST
Anonymous said...: ï»¿[url=http://sunnet-suncity.com/forum.php?mod=viewthread&tid=174500]velvet fashion[/url] Estebanarturo2011 found at properly 12th, 2009 12:01 pm hours:

ï»¿[url=http://www.yipeixun.net/bbs/forum.php?mod=viewthread&tid=662325]cheap online clothing[/url] sustain your feet warm each and every one Season by using Ugg footwear

the essential Things you need to understand; April 2, 2013 at 5:46:00 PM EST
Anonymous said...: http://louisvuittonoutlet.citationguide.net 04216 270657louis vuitton knockoff handbags nyc louis vuitton outlet florida hermes belt price 2012 hermes bracelet black and silver; May 17, 2013 at 6:39:00 AM EST

Beyond Lab

Thursday, February 19, 2009

tips for bioinformatics

17 comments:

AAAS

e-Marketing; Online Marketing; Internet Marketing; 网络营销

F1000 Biology

Do you trust western media?

Labels

About Me

Blog Archive

e-Marketing Internet Marketing

Welcome