Field of Science


I was hoping to get another paper-analysis post in before I left, but I ran out of time. I'm off to America tomorrow morning (*early* tomorrow morning) for a synthetic biology conference. I get the feeling it's going to be utterly mad, and leave me completely exhausted by the time I get back (on Tuesday evening).

Expect some residual synthetic biology stuff when I get back! I'm hoping to scribble down enough for a post while I'm there, and type it up when I get back. I could bring my laptop along, but I'm trying to keep my luggage down to hand-luggage and I suspect I wouldn't have the time. Also, I'm not quite sure of the etiquette of conference-blogging. Some of this stuff might have publishing-potential but not yet been constructed into a paper, and I don't want to accidentally 'out' someones research.

As a quick teaser, here's a picture of what me and my fellow summer-project lab rats will be taking about. All the pigments were made in E. coli:

NextGen sequencing: What Is It Good For?

ResearchBlogging.orgSequencing DNA has become a major industry. The genetic code of an organism contains huge amounts of data, and the potential for a greater understanding of how it works at an intracellular level, and whole centers and genome sequencing factories now exist to fill this need. While most of the sequencing is still done using a modified and more efficient version of Sanger's original dideoxy method, next-generation sequencing machines are starting to emerge that can achieve what is imaginatively named massively parallel sequencing. Massive amounts of DNA can be sequenced in parallel, and we're talking MASSIVE amounts of DNA. Illumina/Solexa machines can sequence hundreds of thousands of DNA molecules all in parallel.

The basic Sanger sequencing method is shown below (image taken from the Science Creative Quarterly, which also has a very good description of the process for those more interested in DNA sequencing)
There is a catch in massively parallel sequencing however. Sequencing works by breaking a large DNA molecule down into smaller 'reads'. Each read is then sequenced and they can be stuck back into the right order (with varying accuracy) once all the reads have been completed. Sanger sequencing (diagram above) can produce reads up to 1000 base pairs long. NextGen sequencing is lucky if it manages 350 base pairs. They tend not to be quite as accurate as well.

What they are is cheap. Which gives geneticists an important tool; large numbers of short genome reads generated at very low cost. While these NextGen techniques are being improved, and there are many people looking into making them more effective for de novo gene sequencing, they are also being put to use in other areas, where the ability to sequence large numbers of short genomic sequences at low cost is hugely beneficial.

The most obvious areas are those where you don't need a particularly long sequence, such as when you just need to find the site of origin of a particular length of DNA. This is particularly useful for looking at transcribed portions of the DNA (those parts that are actually turned into proteins). Sequencing short bits of the transcribed RNA copy (that is used to make the protein) allows this to be compared to the original DNA sequence to find where the DNA corresponding to the protein is and, possibly more importantly, concrete evidence that it is being transcribed. In this situation the short reads aren't a problem, although there are still issues with the accuracy.

Another application is to look for novel small RNAs. These are small sections of RNA which regulate gene expression. They are discovered fairly recently (in plants originally) so there's quite a lot of excitement about them. As they're only small the length of the reads are not a problem. Pyrosequencing (a form of NextGen sequencing) was used to discover the Piwi-interacting RNAs, which are linked to transcriptional silencing in germ line cells.

NextGen sequencing also has a role in protein coding gene annotation. Protein-coding genes can be quite long, and would require several reads from NextGen techniques, but the low cost of these methods means that they are starting to be used for annotating protein coding regions. Integrating them with paired-end sequencing (which allows the reads to be re-connected more easily) removes some of the problems are shorter reads, and novel techniques are continually being explored to increase the accuracy.

NextGen machines are also starting to be used more for metagenomics, which works by taking random soil or water samples and sequencing every bit of DNA you can find, regardless of which organism it comes from. A metagenomics project in the Sargasso Sea (strangely enough most of these projects tend to take place in warmer climates...noone appears to do metagenomics in, say, iceland) produced over 1.2 million unknown gene sequences. These are suspected to be from 'unculturable' bacteria, which for some reason just don't grow in the lab, and metagenomics has revealed a huge number of these bacteria within the ecosystem.

If you want a novel genome sequenced your best bet is still to send it down to the Sanger Centre and be very polite to everyone who works there, but the growth of cheaper machines with massively parallel sequencing provides a whole range of new applications. Even if NextGen machines never quite reach the accuracy and read length of Sanger machines, there are still many areas in science to which they provide a large benefit.


EDIT: I have been informed by people who know a lot more about this than me that NextGen sequences are now pretty much exclusively used for whole gene sequencing. It appears my knowledge is a little out of date. However this post is still an interesting exploration of the other applications of NextGen sequencers, so I'll leave it as it stands.


MOROZOVA, O., & MARRA, M. (2008). Applications of next-generation sequencing technologies in functional genomics Genomics, 92 (5), 255-264 DOI: 10.1016/j.ygeno.2008.07.001

Hutchison, C. (2007). DNA sequencing: bench to bedside and beyond Nucleic Acids Research, 35 (18), 6227-6237 DOI: 10.1093/nar/gkm688

Making Mutants

Nowadays there are many different techniques for looking at gene and protein structure and functions. You can make protein crystal structures, you can see what substrates the protein binds too, you can do various chemical assays to open the protein up and see what it looks like inside. The most classic way however is the scientific equivalent of hitting it until it stops working, and then seeing what you've damaged. This technique, in a slightly more sophisticated wording, seems to be the cornerstone of much of biochemistry, and probably developmental biology as well.

In most cases the 'hitting' is a lot less random than I've probably made it sound. Say you have a stretch of DNA that binds to a protein, and you want to know which parts of the DNA actually physically bind to the protein. The best way to do this is to get the sequence of interest and change the base pairs (that make up the DNA sequence) very specifically, to see which changes stop the protein binding.

Starting with the hypothetical DNA sequence AATATAT. In order to find which bases bind to your protein, you need to make a few very specific point mutations. The way most of the labs around me do this is with a kit from Stratagene called QuikChange (R). Your DNA sequence is likely to be inside a plasmid (small circular piece of DNA) so you design a small primer with the change you want to make, i.e AAGATAT. Adding this to the plasmid, along with some polymerase to expand the DNA, and some nucleotides to expand it with and you get a perfect copy of the plasmid, perfect except for the small difference of the T-G mutation.

QuikChange provides plasmids for you to put your DNA sequence in, and these plasmids have been methylated; some of the DNA base-pairs will have methyl groups attached. To get rid of this original plasmid (after all, you only want your mutated copy, not the un-mutated original) you use a restriction enzyme (DpnI) that literally chops up methylated DNA, leaving nothing behind but your mutated sequence.

Then you add your protein, and see how it binds. If the binding is still just as strong, then that clearly wasn't an important residue. If the binding is weaker, or if less of the protein binds, then that might have been one of the important ones.

(You can of course just use PCR with mutated primers to create single mutants. But you do run a risk of introducing other accidental mutations through the PCR process. And when it doesn't work it's incredibly irritating. I did some research over the summer which proved conclusively that it is possible for PCR mutagenesis to not work for a continuous period of over two months)

Protists and their plastids

This post was chosen as an Editor's Selection for ResearchBlogging.orgA quick skim through this blog reveals fairly quickly that I have a slight fixation on bacteria. I like to research them, read about them, and then blog about them, most specifically about their cell walls. However life contains more than just bacteria, and occasionally, strange though it might seem, people write papers about such non-bacterial things, and they end up on my desk with a small post-it attached reminding me that I have a presentation for my supervision group coming up.

So for the sake of my supervision, and to prevent myself becoming too scientifically blinkered, I took a quick foray this weekend into the murky world of protists, the strange and wonderful organisms that occupy the taxonomic equivalent of the 'misc.' draw in a filing cabinet. The creatures that are neither plant, nor animal, nor demonstrably bacteria. Many of them are single celled, some of them photosynthesise, and they all seem to occupy little evolved niches of their own, producing proteins with no noticeable homologues in any other branch of life.

The paper has the rather terrifying title of : "Rampant polyuridylylation of plastid gene transcripts in the dinoflagellate Lingulodinium". And I am not ashamed to admit that I had to go double-check the meaning of several of those words.

Dinoflagellates are little organisms that live in water, and mostly look a little like the picture on the right. Many of them are marine organisms, making up a large amount of the photosynthesising biomass in the ocean, and occasionally blooming to form 'red tides', leading to whole sweeps of water turning bright red (possibly occasionally on biblical command). The photosynthetic ones contain chloroplasts, which are wrapped up in three membranes, rather than the usual two. These, like all chloroplasts, contain their own genetic material (known as plastid genes), although unlike plant plastids, they don't seem to contain very many, and those that they do posess are found on little minicircles.

What the paper is interested in is whether there are any other genes in the chloroplast which aren't in minicircle form. There are, afterall, only 12 genes encoded on the minicircles, which is a small amount for a plastid. In order to explore this, it uses a characteristic property of the dinflagellate species it's working with. All organisms, when making proteins, make them from an mRNA copy of the genetic code. This mRNA copy tends to have a long string of adenosine residues added to the end, in order to prevent the mRNA getting degraded. This happens in our dinoflagellate species as well, but it doesn't happen to the plastid genes.

However instead of getting multiple adenosine repeats the plastid genes get multiple uracil repeats. It's just a different base, but it allows the mRNA made in the nucleus, and the mRNA made by the chloroplast to be separated. You can probe for adenosine enriched and adenosine depleted mRNA as shown on the gel below (A and B show different species). The psbA mRNA is clearly strongly present A+ (adenosine enriched) and therefore codes for a nuclear encoded protein. Conversely, the 23S RNA is A- (adenosine depleted) and is coded for in the chloroplast, from a plastid gene.

(Image taken from reference below)
The paper selected 300 random poly-uridine mRNAs (A-) and sequenced them to see if they corresponded to genes found in minicircles, or whether they might be plastid genes held in some different architecture. All the A- mRNA corresponded to the 12 genes discovered in the minicircle. They carried out rarefaction analysis to see if their sample size was large enough, apparently it was, in fact 300 clones was way in excess of the amount needed to find a further, non-minicircled-gene.

This suggests that minicircles are the only architecture for plastid genes and, importantly, that there really are only 12 genes contained in the chloroplast of the dinoflagellate Lingulodinium. This is a very small number of genes, all the rest have somehow migrated to the nucleus, leaving these 12 behind. And it's still very much an open question about why these have been left behind. The paper, in its discussion section puts forward the possibility of size. The genes that have been left behind all code for some of the longer proteins usually found in chloroplasts, although the paper does have the good grace to admit that that's not the most convincing of arguments.

It's worlds away from my little bacteria. But still just as fascinating.
Wang, Y. (2006). Rampant polyuridylylation of plastid gene transcripts in the dinoflagellate Lingulodinium Nucleic Acids Research, 34 (2), 613-619 DOI: 10.1093/nar/gkj438

Damage Response Systems

ResearchBlogging.orgAntibiotics can attack many targets in bacteria, and one very popular targets is the bacterial cell wall. Bacteria have been fighting natural antibiotics (produced by fungi and other bacteria) for millions of years, and have a variety of genetic strategies to aid resistance against synthetically developed drugs. Cell-wall antibiotic defence strategies fall into two major responses, which I'll illustrate with the example of bacitracin, as this is the antibiotic I've been studying for my lab work.

Firstly, a specific response against the attacking antibiotic. These can take the form of antibiotic degrading-enzymes, or efflux pumps, which move the antibiotic out of the cell. In the case of bacitracin, it's an efflux pump (encoded by the bcrABC cassette), which uses energy from ATP to transport the bacitracin out of the cell.

The most interesting thing about this system, and indeed many of the antibiotic-specific response systems, lies in it's evolutionary origins. The cassette originally came from a bacteria called Bacillus licheriformis, which is the bacteria that makes the bacitracin antibiotic in the first place. Soil bacteria tend to make a huge number of antibiotics, for defense and invasion, and if you make an antibiotic, it's a good idea to have some way of ensuring that it doesn't destroy your own cellular systems. In fact, given that this is an efflux pump, it might not even have evolved as a defense mechanism...just a pathway for moving the bacitracin into the environment once it had been made, as it is a secreted antibiotic

These ABC transporter systems are found fairly frequently as well. In B. subtilis (one of the better studied bacillus bacteria) eight out of the forty antibiotic genes have ABC transporter systems next to them. Because unlike in eukaryotes (like people) who can often have genes for similar systems on wildly different parts of the chromosome, bacteria like to keep genes used for the similar functions close together. They don't have much genome, they don't have the space pr the protection of a nuclear cell membrane, so they have to be more efficient about packaging.

The second type of response is a more generalised system; rather than responding to a particular antibiotic, it is instead a cellular response to the damaged cell wall. As an example the LiaRS system (a two-component response system)is activated in response to four different cell wall attacking antibiotics (all of which interfere with the rate limiting step of cell-wall building, the lipid II cycle). The Sensor (LiaS) has a short histadine kinase domain which is buried in the membrane. This recognises membrane damage and uses the energy from ATP to phosphorylate the Response Regulator (LiaR) which then leads to gene activation.

The Lia system is more than just a two component system however, there is a third component. As well as the sensor and responder, there is a third protein LiaF which keeps the system 'switched off' when the cell wall is not damaged. This is shown diagrammatically below:

Image from second reference (Jordan et al 2006)

When the cell wall is damaged, the LiaF inhibition is removed, and the LiaS can phosphorylate the LiaR, leading to a change in gene expression, which produces the appropriate response.

Unlike the specific responses, these pathways are often present within the bacteria, as a natural response to cell wall damage. These are not so much resistance mechanisms, as survival mechanisms, that are strongly selected for in times of antibiotic stress. The damage caused by clinical concentrations of antibiotic is usually too much for such systems to cope with, but they form an adequate defense against antibiotic levels in the soil.

Ohki, R., Tateno, K., Okada, Y., Okajima, H., Asai, K., Sadaie, Y., Murata, M., & Aiso, T. (2003). A Bacitracin-Resistant Bacillus subtilis Gene Encodes a Homologue of the Membrane-Spanning Subunit of the Bacillus licheniformis ABC Transporter Journal of Bacteriology, 185 (1), 51-59 DOI: 10.1128/JB.185.1.51-59.2003

Jordan, S., Junker, A., Helmann, J., & Mascher, T. (2006). Regulation of LiaRS-Dependent Gene Expression in Bacillus subtilis: Identification of Inhibitor Proteins, Regulator Binding Sites, and Target Genes of a Conserved Cell Envelope Stress-Sensing Two-Component System Journal of Bacteriology, 188 (14), 5153-5166 DOI: 10.1128/JB.00310-06

How to destroy a bacterial cell wall...

The cell wall is extremely important for bacteria, as it allows them to maintain their existence as a single celled organism, and protects them from the harsh conditions of the outside world. Most bacteria cannot survive without a cell wall; which is why it's such a great target for antibiotics. And as the cell wall is constantly being recycled, disrupting the process to create new cell wall is as good as destroying whats already there.

Bacterial cell walls are made of strands of glycopeptide (shown below, picture from Kimball's biology pages) crosslinked together to form a mesh, which provides a strong support around the cell. There's a whole pathway of enzymes involved in creating the structure, and blocking them, or preventing them from working efficiently, is a quick and easy way of killing off bacteria.
This is the strategy used by Methicillin, an antibiotic that used to be talked about a lot a while ago (it's the 'M' in MRSA) but has been neglected by the media lately in favour of swine-flu and other viruses. Methicillin is a B-lactam antibiotic, which means that it binds to one of the enzymes involved in cell wall metabolism, blocking its active site. More specifically, it binds to the enzyme that creates the cross-links between the glycopeptides (PBP2). No new cell wall can be created, and therefore no more bacteria.

Resistance to this takes several forms. MRSA simply uses a variant of the enzyme, with a deeper active site, so that while the cell-wall precursor substrate can bind, the antibiotic cannot. Protection from a wide variety of different B-lactams can be achieved by B-lactamases, bacterial enzymes which break down the antibiotics. Multi-efflux pumps also exist, these are proteins that span the bacterial cell wall and essentially pump out any antibiotics that make their way into the cell before they can cause any harm.

Vancomycin is the drug that is still most commonly used against MRSA, although some resistance is (as always) beginning to arise. Unlike methicillin, vancomycin does not bind to any bacterial enzymes, instead it binds directly to the cell wall precursors. The part it binds to is shown below, as a close up from the earlier diagram of the cell wall:The incredibly inexpertly added D-ala in red at the bottom shows the precursor form of this section of the cell wall (Ala, Glu and Lys are the short-hand form of amino-acids, so this is just a short protein chain. The L- D- labels show what form the amino-acids are in). The vancomycin binds to the final D-ala-D-ala, preventing it from being processed and halting construction of the cell wall.

Two different methods of preventing cell wall growth; one antibiotic binding to the enzyme, the other to the substrate. And both, sadly, have been defeated by resistance already. In the case of the B-lactams, a wide variety of resistance mechanisms exist, from actively destroying the antibiotic, to pumping it out the cell, or bypassing the enzyme completely. In the case of vancomycin, some bacteria have started producing peptide chains ending in D-ala-D-ser, or D-ala-D-lac, and there have been reports of VRSA from America.

The pathway for creating bacterial cell walls contains multiple steps, and both methicillin and vancomycin halt just one of these, the cross-linking of the short peptide chains near the end. Bacitracin, another cell-wall directed antibiotic, works further upstream in the pathway. I've just been given a whole stack of papers on bacitracin by my I'll probably be writing more about that in the future!