Gene annotation is the 'interesting' bit of genomics. Quite a lot of gene sequencing work has been done, some of it (especially the human bits) very highly publicised. And while genome sequencing is probably useful (more on that maybe in a more ethically-inclined post) on it's own it's not terribly exciting. You're left with a big database full of mindless streams of nucleotides and one bit embarrassing question:
What does it all do?
Gene annotation attempts to answer that; trying to work out which proteins each gene codes for, essentially what the end function of the genome is, what each piece of DNA is used for. There are two main methods: just using DNA, and using data from protein/cDNA sources. Both of these methods can be either comparative or non-comparative:
1) Just using DNA: Non-Comparative
This relies on getting a program such as GENSCAN to, quite literally, scan along the DNA looking for the beginning and end of genes based on sequence patterns it had been told to recognise. Not so good for function, but useful enough for finding the damn genes in the first place. Also relatively cheap and you can go run it overnight.
2)Just using DNA: Comparative
Like it says, this compares your DNA with other previously annotated pieces of DNA to see if there are any very similar bits it can ascribe function to. It's a good starting point, especially now the pool of annotated genomes is increasing, but it's really bad at finding gene start point, especially when there are 'introns', or bits of DNA that are not actually turned into protein. Which is around 95% of the human genome incidentally. (an e.g of this, if anyones interested, is TWINSCAN)
3) cDNA/Protein data: Non-comparitive
cDNA, just to clarify, is DNA that has been reverse transcribed from RNA templates; i.e itt's all the DNA that will get turned into protein, and without any of the introns. A good way to use this is to make cDNA 'libraries' i.e all the cDNA within the cell stored on plasmids, choose one at random, see what it makes and, at the same time, find where it is in the genome. Simple and useful.
4) cDNA/Protein data: Comparative
This compares your genome with bits of cDNA from other genomes, where the cDNA has known function. Protein comparison is even more useful as seeing what protein your protein most resembles provides structural information, as well as functional and allows you to build up homologous families of proteins with similar function (if you have enough genomes). Also if you have enough protein data you can say you're doing 'proteomics' and the more 'omics' words in your project, the more funding you're likely to get :)
By the way, all of these comparative methods are based on homologous evolutionary relationships between the genomes, so anyone who says that scientists never use evolution is WRONG. (and probably pissing off the evodevo people as well)
As always, any questions are welcomed, leave them in the comments and I'll get back to you.
Disclaimer: This post was written while half asleep. Any spelling/grammer mistakes are therefore completely the fault of the writers Brain On Sleep.