Thursday, February 19, 2009

tips for bioinformatics

This was from a talk by Joel Dudley, originally posted by Shirley Wu at http://shirleywho.wordpress.com/2009/02/11/tips-and-tricks-for-software-engineering-in-bioinformatics-talk-by-joel-dudley/.

Quite useful if you want to start programming something...


1. Learn UNIX. It’s quick, it’s powerful, it’s easy to learn. What often takes several lines to code in a scripting language can usually be reduced to a single line on the command line.

2. Be jack of all trades, but master of ONE. That is, be familiar with most programming languages, but be really good at one of them. In the hierarchy of languages, VB and C are more “primitive” while Ruby and Python are most “advanced” - he recommends starting with one of the more advanced languages if you are new to programming. Out of Ruby and Python, Python will probably give you more bang for your buck, due to the smorgasbord of libraries available and broad acceptance (e.g. academic labs, Google). In addition, there are lots of bridges between languages, such as Jython (Java and Python) and JRuby (Java and Ruby) so expert knowledge of one is usually sufficient for you to make a lot of things work practically everywhere.

3. Don’t reinvent the wheel. “Frameworks are your friends.” Take advantage of large existing projects like BioPython/Perl/Ruby/Java, Django, Rails, etc which contain lots of ready to go code for practically everything. Use the internet to find existing code solutions - e.g. Koders is like a Google search for open source code on the web.

4. Learn one text editor really well. Take your pick of Emacs, vi, or a GUI-based editor like TextMate for Macs. The advantage of emacs and vi is that they will be installed on pretty much any system you come across.

5. “Don’t trust yourself”, i.e. use code versioning. Examples are Subversion, CVS, and git. You can even outsource your code hosting with github. Combine this with project management in GForge.

6. Don’t be afraid to use more than 3 letters to define a variable. Having short variable names won’t make the code run faster. It will, however, make the code more difficult for others (and you, 3 months from now) to understand!
Photo by archeon on Flickr

7. Balance architecture and accomplishment. You may be tempted to create something that is complete, elegant, and perfectly structured. This will likely be a waste of time. It’s ok to sacrifice a little bit of structure to get something that actually works.

8. Automate documentation. Documentation is necessary, but it’s a pain to write. So come up with a convention for your headers and make it automatic. Use available tools like Doxygen, JavaDoc, and RDoc, many of which are free.

The above are generic for academic-level software engineering. Some tips that more specifically address high-throughput biomedical computing:

9. Kill the flat file (sort of). This is the most common file format used in bioinformatics, but it hardly lends itself efficient computation. A common task we want to do with the file is read in the data and store it keyed so that we can look up specific pieces of the data later. Hate databases? Cringe at SQL? If you can represent your data as key/value pairs, consider using an embeddable database like the open source BerkeleyDB (now licensed by Oracle), which require no administration. If you don’t mind SQL, but hate the administration, SQLite allows you to create embedded, serverless databases. Other options that go beyond the relational database concept are CouchDB (”a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API”) and Hypertable (”a high performance distributed data storage system”).

10. New ways to do parallel computing. Determine whether your tasks are loosely coupled (independent) or tightly coupled. Although personal computers and laptops are coming out with more cores, most programs only use one at a time. Find ways to utilize idle cores - e.g. there is a way to do this in R. Think in terms of MapReduce. Take advantage of cloud computing, like Amazon’s EC2. Use platforms like Hadoop and Disco to make parallel computing applications. A cool example of this is Cloudburst-Bio, a massively parallel project for genome assembly from next-generation sequencing that uses MapReduce.

11. Embrace hardware. New (and old) hardware is available that can give you significant speedups in biomedical computation, notably graphical processing units (GPUs) which have been used to accelerate molecular dynamics. Hardware vendors like Nvidia are starting to respond; you can now get GPU workstations like NVidia’s Tesla personal supercomputer offering many 100sX speedup over traditional workstations. So if you don’t want to utilize the cloud, you can get an affordable and powerful cluster that fits on top of your desk. Aside from GPUs, there are field programmable gate arrays - chips you can program after manufacturing.

12. Playing nice with others. Think a bit about data exchange formats - but definitely use them! Suggestions are JSON, YAML, and, of course, XML. When working in teams, use an “agile software development” strategy - mainly many fast iterations of the specification-development-feedback cycle. Use tools to automate the development process, such as unit testing and the granddaddy, “make“. Tools like BaseCamp (and perhaps Science 2.0 versions like Laboratree) can help with the more general project management aspects.

————————————————-

In summary:

Focus on the goal (biology or medicine).
Don’t be clever (you’ll trick yourself).
Value your time.
Outsource everything but genius.
Use tools available to you.
And have fun. ;)


Slides for Joel’s presentation are up on Slideshare http://www.slideshare.net/jtdudley/tips-and-tricks-for-bioinformatics-software-engineering.

by Beyond Lab

26 comments:

Hapi said...

hello... hapi blogging... have a nice day! just visiting here....

File said...

Download Forum Poster V3 3.0 at FileAfro.com

http://www.fileafro.com/view_forum-poster-v3-30.html

Keyword said...

hello... you may submit this blog to my webBlog Directory, keyworddir.info.. have a nice day!

Keyword Directory

natasha said...

Watch Natsha Naked!

indavao said...

hi.. just dropping by here... have a nice day! http://kantahanan.blogspot.com

indavao said...

hi... just dropping by!
http://www.fileafro.com
http://mobileandetc.blogspot.com
http://kantahanan.blogspot.com

tagskie said...

hi.. just dropping by here... have a nice day! http://kantahanan.blogspot.com/

JanuskieZ said...

Hi... Looking ways to market your blog? try this: http://bit.ly/instantvisitors

EmmieJDriskell said...

微風成人影片微風免費影片微風線上影片後宮佳麗免費成人影片欣賞後宮佳麗免費影片影片天天看影片大奶影片酒店影片美女影片美國女人做愛影片線上免費成人影片線上免費直播影片線上直播區影片線上直播影片線上girl5320貼影片區girl5320貼影片girl5320貼圖區girl5320貼片區girl5320短片A漫a圖網AV貼片av直播室av亞洲第一站AV成人av成  人網AV女優寫真AV女優裸体影片AV女優限制級短片限制級相簿限制級的影片限制級的色情漫畫網限制級熟女女人做愛影片女人身體秘密影片女同志線上短片女同志av片女同志a片女優後宮dvd女優影城女優影片女優免費影片女優免費電影

Anonymous said...

lovely, i just added many fresh emo backgrounds on my blog
http://www.emo-backgrounds.info

毛衣 said...

第一次睇你blog,鐘意!........................................

Anonymous said...

порно одноклассники без регистрации :-(







порно одноклассники смотреть онлайн 538746
325 видео порно одноклассники
видео порно одноклассники 4519

Anonymous said...

odnotraxniki -)







odnotraxniki 346821
457 odnotrahniki
odnotraxniki ru 1673

Anonymous said...

www odnotraxniki )







odnotrahniki 463572
164 odnolubovniki net
www odnolubovniki ru 3815

Anonymous said...

25819 Скачать Отправь их в ад, Мэлоун! %PP

Anonymous said...

15238 Скачать Светлячки в саду :(

Anonymous said...

48291 Скачать Идеальный побег ;-)

Anonymous said...

код активации эроклассники %PP







код активации эроклассники 286715
476 эроклассники com
www эроклассники ru 8649

Anonymous said...

www эроклассники :-(







сайт эроклассники 571963
461 эроклассники +без регистрации
эроклассники отзывы 3972

Anonymous said...

83914 разработка егэ -)

Anonymous said...

84126 формулы по физике для егэ )

John Mitchel said...

Thank you for really interesting source.

Anonymous said...

ugg protection spray ugg company store ugg boots woman http://www.fdidboots.co.uk [url=http://www.ukgetboots.co.uk][img]http://www.ukgetboots.co.uk/images/ugg-boots.jpg alt="ugg boots uk" title="cheap ugg boots"[/img][/url]
[url=http://www.vtallboots.org.uk/#h4r80h46d][b]sale ugg[/b][/url]
[url=http://www.bootssecher.co.uk/#h8h59c17u][b]ugg boots cheap[/b][/url]
[url=http://www.fdidboots.co.uk/#j0o50r90a][b]ugg online[/b][/url]

Anonymous said...

[url=http://www.hommelv2013.com]Louis Vuitton Soldes[/url]Louis Vuitton Homme[/url]Louis Vuitton SacL[url=http://www.hommelv2013.com]Louis Vuitton Soldes[/url]

Anonymous said...

[url=http://sunnet-suncity.com/forum.php?mod=viewthread&tid=174500]velvet fashion[/url] Estebanarturo2011 found at properly 12th, 2009 12:01 pm hours:


[url=http://www.yipeixun.net/bbs/forum.php?mod=viewthread&tid=662325]cheap online clothing[/url] sustain your feet warm each and every one Season by using Ugg footwear


the essential Things you need to understand

Anonymous said...

http://louisvuittonoutlet.citationguide.net 04216 270657louis vuitton knockoff handbags nyc louis vuitton outlet florida hermes belt price 2012 hermes bracelet black and silver