Research‎ > ‎

Phd Thesis

You can get my full PhD thesis here

The Stochastic Basis of Somatic Variation


Molecular biology has ascribed most phenotypic variation to genetic variation, the so called genetic reducionism. As such, the manifestation of non-genetic individuality, the fact that distinct isogenic cells behave differently under the same environmental conditions, as been a long standing challenge. The low numbers of molecules present within a cell leads to a high variance in the rate of molecular counters, and since the cell functions through molecular reactions which depend on such encounters this means that kinetic rates can be interpreted as stochastic variables. This is especially relevant in gene transcription since one of the reagents, the chromosome itself, is frequently present in 1 or 2 copies. In fact, the stochastic nature of gene transcription has been implicated in the fluctuations of protein numbers in single cells. Therefore, the stochastic nature of molecular reactions provides a framework to understand the essence of non-genetic, somatic variation of protein copy numbers, and the applications and putative functions this phenomenon has for a population of isogenic cells. 

Elucidating the Mechanism Behind Pseudo Monoallelic Expression

In diploid organisms, genes are believed to be mostly expressed in a biallelic fashion, i.e. from both alleles simultaneously. The known exceptions to this pattern of allelic expression are X-chromosome inactivation, Antigen receptor genes and autossomal imprinted genes. Although the mechanisms that enable this pattern of expression are in different stages of understanding, they all have in common the fact that their pattern of expression is stable and heritable, i.e. a cell's choice of which allele it is expressing lasts its whole life cycle and is propagated to its lineage. Recently, it was discovered that some cytokine genes had a pseudo monoallelic pattern of allelic expression. It was termed pseudo since it had a few major differences with the other classes of monoallelically expressed genes:
1. Not all cells in the population display a monoallelic pattern of expression, with most cells not expressing at all and very few expressing the gene from both alleles
2. The particular pattern of allelic expression does not seem to be stable or heritable.

This phenomena constituted a form of somatic variation, since these cells were isogenic and yet displayed different characteristic in spite of being subjected to the same environmental history. Also, since the earliest observations, it was clear that the process of switching between expressing alleles was stochastic.
However, the mechanism by which such phenomena operates remained mostly elusive.
We tried to elucidate the mechanism by which stochastic monoallelic expression operates, we mathematically formalized a number of proposals that were recurrent in the literature and challenged these models with the available data. We asserted the independence of the alleles and the unplausibility of the simple stochastic gene expression in fitting these data. Hence, we proposed a new model for eukaryotic gene transcription that takes into account chromatic dynamics and stochastic gene expression that is powerful enough to fit the data under plausible conditions.

The Origin of the Lognormal Distribution of Protein Copy Numbers

The ultimate causes of variance in protein copy numbers across an isogenic population are believed to be the stochastic nature of transcription due to the low number of molecules involved.
However, the observed distributions of protein copy numbers are well fitted by a lognormal distribution which is not compatible with simple Poissonian processes, derived from simple models of stochastic gene expression. As such, we searched for mechanisms that could exhibit multiplicative noise, a condition that would lead to a lognormal distribution. In fact, many processes within the cell require many steps in order to achieve their function. Specifically, functional protein expression requires that, even after translation, many steps, such as posttranslational modifications and dimerizations, take place in order for the protein to be functional. Many other mechanisms, such as signalling cascades and transcription activation, also have a multistep nature. We modeled these mechanisms and show that they exhibit multiplicative noise and that the distribution of the final product of these mechanisms is expected to be close to a lognormal distribution, with a sufficient number of steps.

The Impact of Heterogeneity in Signaling Modules

Modularity in protein or gene interaction networks holds the promise for an understanding of the overall network dynamics using simple abstractions of these "modules". Although simple, these small networks are often nonlinear in nature and thus are usually characterized through the use computational models describing bulk cell population responses to perturbations. This means that these models are attempting to reproduce the mean behavior of the cell, in the hope that this is the same as the average cell.
However, recent observations show that cell populations display a remarkable heterogeneity in their transcript or protein copy numbers. Since these networks often display nonlinear behavior it is not warranted that the average response of the population is identical to the response of an average cell or in fact to any cell in the population.
Here we evaluate the impact of variability in protein copy numbers in increasingly complex, yet small, networks. We study how a population that shows variability in copy number in key proteins responds, both in steady state as dynamically, to stimulus for each of these modules.
We show that population bulk responses can be misleading in inferring the network within the single cell. Moreover, we show that model inference under these conditions is further impaired due to the inability of the mechanism that generates such responses (at the single cell level) to reproduce the mean behavior of the population. We show that the simple fact that a population has distributed protein amounts creates several levels of heterogeneity across the population: heterogeneity in the sense that some cells, given a stimulus, respond while others don't, in level of response and in timing (dynamics) of the response. This, of course, translates in very different individual behavior in the population and in extreme difficulties in identifying mechanism solely based on average data from populations. Moreover, this constitutes a mechanism by which phenotypic plasticity takes place and raises the question of if this heterogeneity as any adaptive value for the population (consisted of isogenic cells).

The Adaptive Value of Fluctuation Structure of Protein Copy Numbers

Populations of isogenic cells display an appreciable variability in protein copy numbers.
This variability has been attributed to the stochastic nature of intracellular events and has been show to be dynamic, i.e., protein levels fluctuate within individual cells, with different rates. The diversity that results from this stochastic process opens the possibility of selection based on the amount of protein an individual cell contains. This leads to the question of whether the observed variability is beneficial for a population. Specifically, does higher variance of the steady state distribution of protein copy numbers confer an advantage? Do different steady state distributions of this variability have different outcomes in terms of somatic adaptation? If so, this raises the possibility of selection of genetic parameters that modulate the structure of these fluctuations.
Here, we assess the impact of different variances on the dynamics of somatic adaptation to a new environment. Moreover, we compare different steady state distribution in their capacity to facilitate such adaptation. We conclude that higher variances indeed provide an advantage, at least in certain conditions, and that different fluctuation structures (as assessed by the steady state distribution) have different abilities for somatic variation, raising the possibility for a modulation of the this fluctuation by genetic parameters.