Monday, September 1, 2014

Lunch with the Captain

These days I’m having trouble finding time to write, especially to blog.

My colleagues and I are busy building a team and a large network of collaborators for a series of related malaria elimination projects.  Our initial goal in this project is to wipe malaria out in very specific populations.  If this works, and from our initial work at a smaller scale it appears as though it can, it will be vastly scaled up – reaching throughout Southeast Asia.

The impetus for this work is the so-called evolutionary arms race.  This part of the world has a very long history of popping out drug and multi-drug resistant strains of falciparum malaria (C Wongsrichanalai et al., 2001; Chansuda Wongsrichanalai, Pickard, Wernsdorfer, & Meshnick, 2002).  We (malaria workers) roll out a new line of defense (antimalarials) against our chosen adversary, and our adversary quickly develops a defense strategy against us.  These strains can subsequently move from this part of the world to others, parts of sub-Saharan Africa for example, where the malaria burden is much heavier and the results would be much more devastating (Payne, 1987).

Occasionally there are deaths from malaria infections here along the Thailand-Myanmar border (though usually the major toll the illness takes here is in time spent ill and therefore unable to work.)  Not that long ago, a 15 year old boy died from malaria.  He was not far from health care clinics that would have treated him.  The story I hear is that he was without close family members, he lived alone and worked in the agricultural fields, and that he essentially lay in those fields dying from the disease through an apparent gap in his and his community’s social network.  Everyone was devastated.  If complete drug resistance were to reach Africa, this story would be magnified in both space and time.  Even where the social networks were strong, the health clinics wouldn’t be able to adequately treat people with malaria.  The geographic reach would be huge and the numbers of death would likely dramatically increase.  This can’t happen.

Today our last, best tool against malaria is artemisinin and its derivatives.  But already throughout Southeast Asia researchers and health care workers are seeing parasites survive much longer in the human hosts after being treated with artemisinin (Ashley et al., 2014).  How much longer will it work at all?  And should we really wait to find out?  It often feels as though everyone around here has been doing the same “malaria control” game for a very long period of time, regardless of the fact that the outcome is always the same.  Our drugs stop working and we have to start over again.  Sometimes this problem is exacerbated by a lack of information and/or the dissemination of scientific knowledge.  Many of my Thai colleagues who actually work in direct malaria care in this area just learned last year (2013) that resistance to artemisinins might be occurring or even growing and spreading in their region.  A major scientific paper on this (that I’m aware of) came out 5 years ago (Dondorp, Nosten, & Yi, 2009), with rumors of it almost 10 years ago (Noedl et al., 2008)!  Shouldn’t the people who live in the war zone know that a war is happening!?  What a failure of science – and of our strategy over the long term.  It is time for a change.

So what we’re working on is a tool that we’re calling “targeted chemo-elimination.”  Essentially this is a form of mass drug administration.  That is, everyone in a targeted community would take drugs (antimalarials) regardless of whether or not they felt sick (some recent thoughts on this here, here, and here).  It is much more complicated than this though, in that it isn’t a single strong dose of the antimalarials, we’ll be using a cocktail of drugs so that we can hope to avoid further driving resistance, and since the administration will occur over time, over several stages, we’ll be able to vary this cocktail if necessary.

Logistically this is extremely difficult to pull off.  It is hard enough to get people in easy to reach populations in places like the U.S. to take medicine when they feel sick, let alone to take a vaccine that would prevent them from being sick.  How do we go about convincing people in extremely remote populations, frequently in the middle of old or continuing conflict zones, to take medicine, over a long period of time, regardless of whether or not they are currently feeling sick?  It isn’t easy.

But it can be done and the way to do it is through community engagement – drawing on notions and principles well-known in anthropology and other social sciences.  It can happen when there is understanding, trust, and social cohesion.  Sometimes these things are lacking in our target communities between members of the community, and/or between us and members of the community, and it is therefore important to build them up.  Sometimes we need to plant a seed, water it, foster it, and help it to grow.

This is exhausting work, physically, psychologically, and emotionally.

A little while back I made a trip to one of the communities in our target area, to visit local people and share some of what our project is about.  I wound up eating lunch at a table full of “freedom fighters”, some dressed in fatigues and drinking whisky out of small coffee cups.  A captain who was sitting at the table gave me a history lesson, translated to English through one of my colleagues who speaks both my tongue and the local language.  I heard stories about being betrayed by colonialists who promised these people their own land but never followed through and of people who were willing to die for that land, many of whom did in fact pay that price.

Among the things he said to me was that he admired two major things about Americans.  One is that their time is their money (time is extremely valuable).  And the other is that they realize they have a burden, to help others, that is bigger than a mountain (we were sitting at the base of a relatively large one).

I don’t know if this generalization is true of all Americans and I don’t care to go into that.  But I do know that time is of the essence and that I feel a burden.  There is a lot of work to do, and not so much time in which to do it.

*** As always, my opinions are my own.  This post and my opinions do not necessarily reflect those of Shoklo Malaria Research Unit, Mahidol Oxford Tropical Medicine Research Unit, or the Wellcome Trust. 

Ashley, E. a., Dhorda, M., Fairhurst, R. M., Amaratunga, C., Lim, P., Suon, S., … White, N. J. (2014). Spread of Artemisinin Resistance in Plasmodium falciparum Malaria. New England Journal of Medicine, 371(5), 411–423. doi:10.1056/NEJMoa1314981

Dondorp, A., Nosten, F., & Yi, P. (2009). Artemisinin resistance in Plasmodium falciparum malaria. The New England Journal of MedicineEngland Journal of …, 455–467. Retrieved from

Noedl, H., Se, Y., Schaecher, K., Smith, B., Socheat, D., & Fukuda, M. (2008). Evidence of artemisinin-resistant malaria in western Cambodia. N Engl J Med, 359(24), 2619–2620.

Payne, D. (1987). Spread of chloroquine resistance in Plasmodium falciparum. Parasitology Today (Personal Ed.), 3(8), 241–6. Retrieved from

Wongsrichanalai, C., Pickard, A. L., Wernsdorfer, W. H., & Meshnick, S. R. (2002). Epidemiology of drug-resistant malaria. Lancet Infectious Diseases, 2, 209–218.

Wongsrichanalai, C., Sirichaisinthop, J., Karwacki, J. J., Congpuong, K., Miller, R. S., Pang, L., & Thimasarn, K. (2001). Drug resistant malaria on the Thai-Myanmar and Thai-Cambodian borders. Southeast Asian J Trop Med Public Health, 32(1), 41–49.

Friday, August 29, 2014

Genomic cold fusion? Part II. Realities of mapping

Mapping to find genomic causes of a trait of interest, like a disease, is done when the basic physiology is not known—maybe we have zero ideas, or the physiology we think is involved doesn’t show obvious differences between cases and controls.  If you know the biology, you won't have to use mapping methods, because you can explore the relevant genes directly.  Otherwise, and today often, we have to go fishing, in the genome, to find places that may vary in association—statistical regularity—with the trait.

The classical way to do this is called linkage analysis.  That term generally refers to tracing cases and marker variants in known families.  If parents transmit a causal allele (variant at some place in the genome) to their children, then we can find clusters of cases in those families, but no cases in other families (assuming one cause only).  We have Mendel’s classical rules for the transmission pattern and can attempt to fit that pattern to the data—for example, to exclude some non-genetic trait sharing.  After all, family members might share many things just because they have similar interests or habits.  Even disease can be due to shared environmental exposures. Mendelian principles allow us, with enough data, to discriminate.

“Enough data” is the catch.  Linkage analysis works well if there is a strong genetic signal.  If there is only one cause, we can collect multiple families and analyze their transmission patterns jointly.  Or, in some circumstances, we can collect very large, multi-generational families (often called pedigrees) and try to track a marker allele with the trait across the generations.  This has worked very well for some very strong-effect variants conferring very high risk for very specific, even quite rate, disorders.  That is because the linkage disequilibrium—the association between a marker allele and a causal variant due to their shared evolutionary history (as described in Part I) ties the two together in these families.

But it is often very costly or impractical to collect actual large pedigrees that include many children each generation, and multiple generations.  Family members who have died cannot be studied and medical records may be untrustworthy, or family members may have moved, refuse to participate in a study, or be inaccessible for many reasons.  So a generation or so ago the idea arose that if we collect cases from a population we may also collect copies of nearby marker alleles in linkage disequilibrium—shared evolutionary history in the population—so that, as described in Part I, a marker allele has been transmitted through many generations of unknown but assumed pedigree, so that the marker will have been transmitted in the pedigree along with the causal variant.  This is implicit linkage analysis, called genomewide association analysis (GWAS), about which we’ve commented many times in the past.  GWAS look for association between marker and causal site in implicit but assumed pedigrees, and is another form of linkage analysis.

When genetic causation is simple enough, this will work.  Indeed, it is far easier and less costly to collect many cases and controls than many deep pedigrees, so that a carefully designed GWAS can identify causes that are reasonably strong.  But this may not always work, when a trait is ‘complex’, and has many different genetic and/or environmental contributing causes.

If causation is complex, families provide a more powerful kind of sample to use in searching for genetic factors.  The reason is simple: in general a single family will be transmitting fewer causal variants than a collection of separate families.  Related to this is the reason that isolate populations, like Finland or Iceland, can in principle be good places to search, because they represent very large, even if implicit, pedigrees.  Sometimes the pedigree can actually be documented in such populations.

If causation is complex, then linkage analysis in families will hopefully be better than big population samples for finding causal contributors, simply because a family will be segregating (transmitting) fewer different causal variants than a big population.  We might find the variant in linkage analysis in a big family, or an isolate population, but of course if there are many different variants, a given family may point us only to one or two of them.  For this reason, many argue that family analysis is useless for complex traits—one commenter on a previous Tweet we made from our course, likened linkage analysis for complex traits to ‘cold fusion’.  In fact, this was a mistake and is incorrect. 

Association analysis, the main alternative to linkage analysis, is just a combining of many different implicit families, for the population-history reason we’ve described here and in Part I.  The more families you combine, whether they are explicit or implicit, the more variation, including statistical ‘noise’, you incorporate.  The rather paltry findings of many GWAS are a testament to this fact, explaining as they have only a small fraction of most traits to which that method has been applied.  Worse, the greater the sample of this type, like cases vs controls, the more environmental variation you may be grouping together, again greatly watering down even the weak signal of many or, probably, by far most genetic causal factors.

In fact, if you are forced to go fishing for genetic cause, you may well be fishing in dreamland because you may simply be in denial of the implications of causal complexity.  In fact, all mapping is a form of linkage analysis.  Instead, one should tailor one’s approach to the realities of data and trait.  Some complex trait genes have been found by linkage analysis (e.g., the BRCA breast-cancer associated genes), though of course here we might quibble about the definition of 'complexity'. 

Sneering at linkage analysis because it is difficult to get big families, or  because even single deep families may themselves be transmitting multiple causes (as is often found in isolate studies, in fact), is often simply a circle-the-wagon defense of Big Data studies, that capture huge amounts of funding with relatively little payoff to date.

A biological approach?
Many linkage and association analyses are done because we don’t understand the basic biology of a trait well enough to go straight to ‘candidate’ genes to detect, prevent, or develop treatment for a trait.  Today, even though this approach has been the rule for nearly 20 years now, with little payoff, the defense is often still that more, more and even more data will solve the problem.  But if causation is too complex this can also be a costly, self-interested, weak defense.

If we have whole genome sequence on huge numbers of people, or even everyone in a population, or in many populations so we can pool data, that we will find the pot of gold (or is it cold fusion?) at the end of the rainbow.

One argument for this is to search population-wide genome sequenced biomedical data bases for variants that may be transmitted from parents to offspring, but that are so rare that they cannot generate a useful signal in huge, pooled GWAS studies.  This usually will still be in the form of linkage analysis if a marker in a given causal gene is transmitted with the trait in occasional families but the same gene is identified, even if via different families.  That is, if variation in the same gene is found to be involved in different individuals, but with different specific alleles, then one can take that gene seriously as a causal candidate.

This sometimes works, but usually only when the gene’s biology is known enough to have a reason to suspect it.  Otherwise, the problem is that so much is shared between close family members (whether implicitly or explicitly in known pedigrees) that if you don’t know the biology there will be too much to search through, too much co-transmitted variation.  Causal variation need not be in regular ‘genes’, but can be, and for complex traits seems typically to be, in regulatory or other regions of the genome, whose functional sites may not be known.  Also, we all harbor variation in genes that is not harmful, and we all carry ‘dead’ genes without problems, as many studies have now shown.

If one knows enough biology to suspect a set of genes, and finds variants of known effect (such as truncating a gene’s coding region so a normal protein isn’t made) in different affected individuals, then one has strong evidence s/he has found a target gene.  There are many examples of this for single-gene traits.  But for complex traits, even most genes that have been identified have only weak effects—the same variant most of the time is also found in healthy, unaffected individuals.  In this case, which seems often to be the biological truth, there is no big-cause gene to be found, or a gene has a big-cause only in some unusual genotypes in the rest of the genome.

Even knowing the biology doesn't say whether a given gene's protein code is involved rather than its regulation or other related factors (like making the chromosomal region available in the right cells, downregulating its messenger RNA, and other genome functions).  Even in multiple instances of a gene region, there may be many nucleotide variants observed among cases and controls.  The hunt is usually not easy even knowing the biology--and this is, of course, especially true if the trait isn't well-defined, as is often the case, or if it is complex or has many different contributors.

Big Data, like any other method, works when it works.  The question is when and whether it is worth its cost, regardless of how advantageous for investigators who like playing with (or having and managing) huge resources.  Whether or not it is any less ‘cold fusion’ than classical linkage analysis in big families, is debatable.  

Again, most searches for causal variation in the genome rest on statistical linkage between marker sites and causal sites due to shared evolutionary history.  Good study design is always important.  Dismissal of one method over another is too often little more than advocacy of a scientist’s personal intellectual or vested interests.

The problem is that complex traits are properly named:  they are complex. Better ideas are needed than what are being proposed these days.  We know that Big Data is ‘in’ and the money will pour in that direction.  From such data bases all sorts of samples, family or otherwise, can be drawn.  Simulation of strategies (such as with programs like our ForSim that we discussed in our recent Logical Reasoning course in Finland) can be done to try to optimize studies. 

In the end, however, fishing in a pond of minnows, no matter how it’s done, will only find minnows. But these days they are very expensive minnows.

Thursday, August 28, 2014

Genomic cold fusion? Part I. Rational and irrational aspects of mapping

I’m sitting here on a smooth, quiet train from Zurich to Innsbruck, a few days after the mini-course that we taught in Helsinki. In this post I want to make a few reflections on things said by people reacting to Facebook or Twitter messages about the course, comments that were too short to do justice to what we actually said.

In particular, the issues have to do with the nature of genome mapping strategies and what they are or mean.  There seems to be a good bit of confusion in this area, perhaps because of a lack of proper explanation of what these methods do, and why and how they work.

First, nobody should be doing mapping, looking for genes causally responsible for traits, unless they have some legitimate reason for believing that a trait is substantially affected by genes—that is, that variation in the trait or risk of a trait like a disease is causally associated with variation in a particular spot in the genome.  Such a reason, at best, would be that the trait seems to segregate in families as if caused by a single Mendelian factor.  If the evidence is weaker than that—as it so often is—then mapping becomes the more problematic.

If we don’t know the part of the genome that affects the trait, then we use many measured variable sites, called markers, that span the genome with the idea that wherever the causal site is, it will be near one of our markers.  Essentially, that is, we are searching for statistically significant associations between the marker and trait, based on some basically subjectively chosen measure, like a p-value, in samples that we believe are appropriate for detecting causal effects.

What is perhaps not widely appreciated, is the nearly essential way that such searches rely on evolutionary assumptions.  We say ‘nearly’ because if one happens by huge luck to genotype the causal site itself, the test for association may be a bit more direct, as we’ll try to explain.

Mapping is based on evolutionary history
Evolution, or population history, generates the variation that causes the trait effect, and the variation we use as markers.  Mutational events generating these variants occur when they occur, and we choose markers based on the idea that they vary in our chosen type of sample, and that the instances of a given marker allele (variant) are descendant copies of some original mutation.  These instances of the same allele are said to be identical by descent (IBD) from that common ancestral copy.  Sets of instances of the marker also mark nearby chromosomal regions that have been passed down the same chain of descent.  That shared region is called a haplotype, and it gradually shortens over the post-mutation generations by a process called recombination.

If at some later time in the history of the haplotype ‘tagged’ by the marker variant another mutation occurs in a gene and alters that gene’s effects to generate the trait we are interested in, then the marker variant will be present in subsequent descendant copies of that twice-hit haplotype, and the causal signal will be associated with the presence of the marker variant.  This is called linkage disequilibrium (LD), and is the reason that mapping works.  That is, mapping works because of shared evolutionary (population) history of the marker and causal variants.

An hypothetical, simple example
[I’m continuing this post a couple of days from when I started it on the train to Innsbruck, and now finishing it in a nice hotel in Old Town, overlooking the Inn river.  Beautiful!]

Let’s say that we have a marker at which some people have a G nucleotide and others a T.   And let’s say the disease causal site, D, is near the G/T site, and that the D mutation, wherever it is on the chromosome, is near a copy of the chromosome that has the G on it at the marker site.  Then, what we hope is that the disease will be associated with the G—that enough more people with the disease will have the G than people without the disease.  This is the kind of association between trait-cause and marker that mapping is looking for.  But what can make it happen?

If we’re lucky everyone with the D allele at the causal site will have the trait (the ‘D’ mutation is fully penetrant, as we’d say).  And if there has been no recombination, and no other way to get the trait, then nobody with a T at the marker will also have the D variant—none of the T-bearers will have the disease.  Cases will have the G, controls the T.

This sort of perfect association depends on when the D-mutation, wherever it is on the chromosome, occurred relative to the mutation that produced the T at the marker.  We usually pick marker sites because we know that the variation (here, G vs T) is common in the population, and that means that the mutation is rather old.  Enough generations have passed for there to be a substantial fraction of T-bearing, and G-bearing people in the population.

If the ‘D’ mutation occurred right after this G-T marker’s mutation, then all copies of the G variant at the marker will also have the trait.  But if the trait-mutation occurred much later, then only a few of the G-bearing chromosomes will have the D-causing trait.  The association, even if true, will be weak.  If the D-site is far from the G-T marker site, then if the D-causing mutation occurred long enough ago for most G-bearers also to have the trait, but there’s a trap: in this case there will have  been enough time for recombination to switch the D-site onto a T-bearing marker chromosome.  The G-D association will no longer be perfect.

Likewise, if there are many different causes of the trait, then some cases will not be due to the D-variant (tagged by the G-allele at the nearby marker), even if the latter really is also a cause.  We’ll have cases with the T-marker variant, and in this case it’s not because of recombination.  The more causes of the trait the weaker the association between a specific marker, like the G-T one. 

Science or cold fusion?
So mapping is a multiple-edged sword.  Now, there are several ways to try to find trait-associated parts of the genome.  One is called linkage mapping, the other association mapping (genomewide association, or GWAS).  And one can also think that causal sites can be found not  by relying on linkage-disequilibrium, but simply by looking for causal variants directly.

These various strategies have their strong and weak points, and there is just as strong disagreement as to which to apply when.  That’s why someone can, sometimes sneeringly, claim that this or that approach is ‘cold fusion’—that it’s imaginary, and won’t or can’t work.  But since mapping for complex traits is not doing very well—as we’ve posted many times (and many others have repeatedly observed), we are usually explaining only a rather small if not trivial fraction of causation by mapping, the issues are serious, regardless of the vested interests of those contending with these issues.

In our next post we’ll discuss some of these issues about methods.

Choose a blog post and vote!

Choose your favorite blog post among the nominees at 3 Quarks Daily, and vote here!  (There are 3 MT nominees -- just sayin'!)

Wednesday, August 20, 2014

Blogging isn't catastrophic, but the opposite could be.

Ken and I just had an article published in Evolutionary Anthropology:

Catastrophes in evolution: Is Cuvier's world extinct or extant?

It's open access, so no need for a subscription to read it.

It's the second one we've done (first one is here). The piece is largely the product of many discussions we've had, mainly over email, and these discussions were sparked by posts we had each written for the MT.

Beyond how satisfying it was to have these discussions with Ken and to write this paper with him, it was a great excuse to read Elizabeth Kolbert's articles in The New Yorker (here and here) as well as her wonderful book that accompanies them:

Although the subtitle's irksome if you're not keen on separating human behavior from nature, the book's incredibly insightful. And, it's captivating if you just love tales of exploration and discovery, and if you eat up details about kit, gear and extraordinary travel conditions. It was sometimes difficult to read through my jealousy, and I consider that reason alone to recommend this read, regardless of the compelling scientific history, the exciting albeit depressing cutting-edge knowledge, as well as the important political message that only peeks out, from under the enormous pile of scientific evidence, in her final paragraphs.

It's because of our ongoing discussions and writings and then also Kolbert's, that Ken and I got to thinking about whether and how extinction, background and mass extinctions, and especially Cuvier's pre-Darwinian notions of "catastrophism" are playing out in paleoanthropology right now. This is the overall theme of our piece linked above.

Kolbert deals briefly with Neanderthals near the end of her book. However, Ken and I weren't so much concerned with what happened to the Neanderthals as whether, for instance, we could fairly consider what happened to them to be "extinction" given what we know about their DNA living on inside, probably, billions of us today. And, because of those genetic circumstances, it naturally made us wonder whether anything we call "extinct" truly is and if it is, how could we know? This of course begs for a thoughtful consideration of species and adaptation and, seemingly, all the ol' evolutionary chestnuts that are terribly difficult to crack.

I don't think that what Ken and I contributed in Evolutionary Anthropology was far different from anything that could have occurred before blogs were invented, but blogging certainly did facilitate it. What's more, if I didn't have The Mermaid's Tale, if I wasn't routinely reading it and writing for it, I probably wouldn't be thinking this regularly and this deeply about many of these marvelous things in the first place, especially not with the unimaginably wonderful benefit of engaging with Anne and Ken.  What a catastrophe that would be.

Tuesday, August 19, 2014

Nominate a blog post for the 3 Quarks Daily science writing prize

If you write about science or if you read about science, and if you like making new friends, earning praise and winning money, or if you would like science writers to make new friends, earn praise and win money, then you should definitely, by the August 22 deadline, nominate something for this:

The 5th annual 3 Quarks Daily science writing prize!

Information here:

3QD editor Abbas Raza says:
We are very honored and pleased to announce that Frans de Waal has agreed to be the final judge for our 5th annual prize for the best blog and online writing in the category of science. Details of the previous four science (and other) prizes can be seen on our prize page.

What a fantastic judge they scored this year.

Last round of this contest--thanks to readers of the MT who voted and to the 3QD editors and that year's judge, Sean Carroll--I won the Charm Quark for "Forget bipedalism, what about babyism?"*

It's a wonderfully inspiring thing to experience and I'm so excited for the writers who will win this year's contest. Please help to make it a good contest by nominating what's turned you on, lit you up, wizened, informed, enlightened, or inspired you.  All you have to do is choose something about science that you like, dating back only as far as August 10, 2013, and then post the URL to it in the comments section HERE.

Each person can only nominate one link, which encourages writers to nominate one of their own. So don't be humble or shy or insecure. Do it!

And if you're not a writer, nominate a link that you've really enjoyed reading. Support your science writers in this often thankless service!

This isn't a ploy to get you to nominate one of mine. For good fun, I already nominated this one anyway:

But if Ken, Anne, Dan, Jim, Reed or another guest writer posted something here, or if another writer posted something anywhere else in the last year that stuck with you or that struck you, then for the love of science and science writing, please nominate them before the August 22 deadline!

*Which now has deadlinks to cute photos because back in 2012 I didn't know what the hell I was doing with images in blogger.

Monday, August 18, 2014

Logical Reasoning in Helsinki

Ken and I are in Finland this week co-teaching the Logical Reasoning in Human Genetics course that Ken and Joe Terwilliger have taught a number of times in a number of places over the last 10 years.  People in the class, and/or I, may do some live tweeting at #lrhg14.

We'll be away for another week or so after the course.  We will do some blogging this week or next if we find the time.  If not, we'll be back the first week of September.

Helsinki: Wikipedia