We are searching data for your request:

**Forums and discussions:**

**Manuals and reference books:**

**Data from registers:**

**Wait the end of the search in all databases.**

Upon completion, a link will appear to access the found materials.

Upon completion, a link will appear to access the found materials.

Recent years have seen tremendous growth of Bayesian approaches in reconstructing phylogenetic trees and estimating their branch lengths. Although there are currently only a few Bayesian comparative methods, their number will certainly grow as comparative biologists try to solve more complex problems. In a Bayesian framework, the quantity of interest is the **posterior probability**, calculated using Bayes' theorem:

$$ Pr(H|D) = dfrac{Pr(D|H) cdot Pr(H)}{Pr(D)} label{2.19}$$

The benefit of Bayesian approaches is that they allow us to estimate the probability that the hypothesis is true given the observed data, (Pr(H|D)). This is really the sort of probability that most people have in mind when they are thinking about the goals of their study. However, Bayes theorem also reveals a cost of this approach. Along with the likelihood, (Pr(D|H)), one must also incorporate prior knowledge about the probability that any given hypothesis is true - *P**r*(*H*). This represents the prior belief that a hypothesis is true, even before consideration of the data at hand. This prior probability must be explicitly quantified in all Bayesian statistical analyses. In practice, scientists often seek to use “uninformative” priors that have little influence on the posterior distribution - although even the term "uninformative" can be confusing, because the prior is an integral part of a Bayesian analysis. The term (Pr(D)) is also an important part of Bayes theorem, and can be calculated as the probability of obtaining the data integrated over the prior distributions of the parameters:

[Pr(D)=∫_HPr(H|D)Pr(H)dH label{2.20}]

However, (Pr(D)) is constant when comparing the fit of different models for a given data set and thus has no influence on Bayesian model selection under most circumstances (and all the examples in this book).

In our example of lizard flipping, we can do an analysis in a Bayesian framework. For model 1, there are no free parameters. Because of this, (Pr(H)=1) and (Pr(D|H)=P(D)), so that (Pr(H|D)=1). This may seem strange but what the result means is that our data has no influence on the structure of the model. We do not learn anything about a model with no free parameters by collecting data!

If we consider model 2 above, the parameter *p*_{H} must be estimated. We can set a uniform prior between 0 and 1 for *p*_{H}, so that *f*(*p*_{H})=1 for all *p*_{H} in the interval [0,1]. We can also write this as “our prior for (p_h) is U(0,1)”. Then:

$$ Pr(H|D) = frac{Pr(D|H) cdot Pr(H)}{Pr(D)} = frac{P(H|p_H,N) f(p_H)}{displaystyle int_{0}^{1} P(H|p_H,N) f(p_h) dp_H} label{2.21} $$

Next we note that (Pr(D|H)) is the likelihood of our data given the model, which is already stated above as Equation ef{2.2}. Plugging this into our equation, we have:

$$ Pr(H|D) = frac{inom{N}{H} p_H^H (1-p_H)^{N-H}}{displaystyle int_{0}^{1} inom{N}{H} p_H^H (1-p_H)^{N-H} dp_H} label{2.22} $$

This ugly equation actually simplifies to a beta distribution, which can be expressed more simply as:

$$ Pr(H|D) = frac{(N+1)!}{H!(N-H)!} p_H^H (1-p_H)^{N-H} label{2.23} $$

We can compare this posterior distribution of our parameter estimate, *p*_{H}, given the data, to our uniform prior (Figure 2.3). If you inspect this plot, you see that the posterior distribution is very different from the prior – that is, the data have changed our view of the values that parameters should take. Again, this result is qualitatively consistent with both the frequentist and ML approaches described above. In this case, we can see from the posterior distribution that we can be quite confident that our parameter *p*_{H} is not 0.5.

As you can see from this example, Bayes theorem lets us combine our prior belief about parameter values with the information from the data in order to obtain a posterior. These posterior distributions are very easy to interpret, as they express the probability of the model parameters given our data. However, that clarity comes at a cost of requiring an explicit prior. Later in the book we will learn how to use this feature of Bayesian statistics to our advantage when we actually do have some prior knowledge about parameter values.

Figure 2.3. Bayesian prior (dotted line) and posterior (solid line) distributions for lizard flipping. Image by the author, can be reused under a CC-BY-4.0 license.

## Section 2.4b: Bayesian MCMC

The other main tool in the toolbox of Bayesian comparative methods is the use of Markov-chain Monte Carlo (MCMC) tools to calculate posterior probabilities. MCMC techniques use an algorithm that uses a “chain” of calculations to sample the posterior distribution. MCMC requires calculation of likelihoods but not complicated mathematics (e.g. integration of probability distributions, as in Equation ef{2.22}, and so represents a more flexible approach to Bayesian computation. Frequently, the integrals in Equation ef{2.21} are intractable, so that the most efficient way to fit Bayesian models is by using MCMC. Also, setting up an MCMC is, in my experience, easier than people expect!

An MCMC analysis requires that one constructs and samples from a Markov chain. A Markov chain is a random process that changes from one state to another with certain probabilities that depend only on the current state of the system, and not what has come before. A simple example of a Markov chain is the movement of a playing piece in the game Chutes and Ladders; the position of the piece moves from one square to another following probabilities given by the dice and the layout of the game board. The movement of the piece from any square on the board does not depend on how the piece got to that square.

Some Markov chains have an equilibrium distribution, which is a stable probability distribution of the model’s states after the chain has run for a very long time. For Bayesian analysis, we use a technique called a **Metropolis-Hasting algorithm** to construct a special Markov chain that has an equilibrium distribution that is the same as the Bayesian posterior distribution of our statistical model. Then, using a random simulation on this chain (this is the Markov-chain Monte Carlo, MCMC), we can sample from the posterior distribution of our model.

In simpler terms: we use a set of well-defined rules. These rules let us walk around parameter space, at each step deciding whether to accept or reject the next proposed move. Because of some mathematical proofs that are beyond the scope of this chapter, these rules guarantee that we will eventually be accepting samples from the Bayesian posterior distribution - which is what we seek.

The following algorithm uses a Metropolis-Hastings algorithm to carry out a Bayesian MCMC analysis with one free parameter.

Metropolis-Hastings algorithm

- Get a starting parameter value.
- Sample a starting parameter value,
*p*_{0}, from the prior distribution.

- Sample a starting parameter value,
- Starting with
*i*= 1, propose a new parameter for generation i.- Given the current parameter value,
*p*, select a new proposed parameter value,*p*′, using the proposal density*Q*(*p*′|*p*).

- Given the current parameter value,
- Calculate three ratios.
- a. The prior odds ratio. This is the ratio of the probability of drawing the parameter values
*p*and*p*′ from the prior. [R_{prior} = frac{P(p')}{P(p)} label{2.24}] - b. The proposal density ratio. This is the ratio of probability of proposals going from
*p*to*p*′ and the reverse. Often, we purposefully construct a proposal density that is symmetrical. When we do that,*Q*(*p*′|*p*)=*Q*(*p*|*p*′) and*a*_{2}= 1, simplifying the calculations . [R_{proposal} = frac{Q(p'|p)}{Q(p|p')} label{2.25}] - c. The likelihood ratio. This is the ratio of probabilities of the data given the two different parameter values. [R_{likelihood} = frac{L(p'|D)}{L(p|D)} = frac{P(D|p')}{P(D|p)} label{2.26}]

- a. The prior odds ratio. This is the ratio of the probability of drawing the parameter values
- Multiply. Find the product of the prior odds, proposal density ratio, and the likelihood ratio.[R_{accept} = R_{prior} ⋅ R_{proposal} ⋅ R_{likelihood} label{2.27}]
- Accept or reject. Draw a random number
*x*from a uniform distribution between 0 and 1. If*x*<*R*_{accept}, accept the proposed value of*p*′ (*p*_{i}=*p*′); otherwise reject, and retain the current value*p*(*p*_{i}=*p*). - Repeat. Repeat steps 2-5 a large number of times.

Carrying out these steps, one obtains a set of parameter values, *p*_{i}, where *i* is from 1 to the total number of generations in the MCMC. Typically, the chain has a “burn-in” period at the beginning. This is the time before the chain has reached a stationary distribution, and can be observed when parameter values show trends through time and the likelihood for models has yet to plateau. If you eliminate this “burn-in” period, then, as discussed above, each step in the chain is a sample from the posterior distribution. We can summarize the posterior distributions of the model parameters in a variety of ways; for example, by calculating means, 95% confidence intervals, or histograms.

We can apply this algorithm to our coin-flipping example. We will consider the same prior distribution, *U*(0, 1), for the parameter *p*. We will also define a proposal density, *Q*(*p*′|*p*) *U*(*p* − *ϵ*, *p* + *ϵ*). That is, we will add or subtract a small number ( *ϵ* ≤ 0.01) to generate proposed values of *p*′ given *p*.

To start the algorithm, we draw a value of *p* from the prior. Let’s say for illustrative purposes that the value we draw is 0.60. This becomes our current parameter estimate. For step two, we propose a new value, *p*′, by drawing from our proposal distribution. We can use *ϵ* = 0.01 so the proposal distribution becomes *U*(0.59, 0.61). Let’s suppose that our new proposed value *p*′=0.595.

We then calculate our three ratios. Here things are simpler than you might have expected for two reasons. First, recall that our prior probability distribution is *U*(0, 1). The density of this distribution is a constant (1.0) for all values of *p* and *p*′. Because of this, the prior odds ratio for this example is always:

$$ R_{prior} = frac{P(p')}{P(p)} = frac{1}{1} = 1 label{2.28}$$

Similarly, because our proposal distribution is symmetrical, *Q*(*p*′|*p*)=*Q*(*p*|*p*′) and *R*_{proposal} = 1. That means that we only need to calculate the likelihood ratio, *R*_{likelihood} for *p* and *p*′. We can do this by plugging our values for *p* (or *p*′) into Equation
ef{2.2}:

$$ P(D|p) = {N choose H} p^H (1-p)^{N-H} = {100 choose 63} 0.6^63 (1-0.6)^{100-63} = 0.068 label{2.29} $$

Likewise,

$$ P(D|p') = {N choose H} p'^H (1-p')^{N-H} = {100 choose 63} 0.595^63 (1-0.595)^{100-63} = 0.064 label{2.30}$$

The likelihood ratio is then:

$$ R_{likelihood} = frac{P(D|p')}{P(D|p)} = frac{0.064}{0.068} = 0.94 label{2.31}$$

We can now calculate *R*_{accept} = *R*_{prior} ⋅ *R*_{proposal} ⋅ *R*_{likelihood} = 1 ⋅ 1 ⋅ 0.94 = 0.94. We next choose a random number between 0 and 1 – say that we draw *x* = 0.34. We then notice that our random number *x* is less than or equal to *R*_{accept}, so we accept the proposed value of *p*′. If the random number that we drew were greater than 0.94, we would reject the proposed value, and keep our original parameter value *p* = 0.60 going into the next generation.

If we repeat this procedure a large number of times, we will obtain a long chain of values of *p*. You can see the results of such a run in Figure 2.4. In panel A, I have plotted the likelihoods for each successive value of p. You can see that the likelihoods increase for the first ~1000 or so generations, then reach a plateau around *l**n**L* = −3. Panel B shows a plot of the values of *p*, which rapidly converge to a stable distribution around *p* = 0.63. We can also plot a histogram of these posterior estimates of *p*. In panel C, I have done that – but with a twist. Because the MCMC algorithm creates a series of parameter estimates, these numbers show autocorrelation – that is, each estimate is similar to estimates that come just before and just after. This autocorrelation can cause problems for data analysis. The simplest solution is to subsample these values, picking only, say, one value every 100 generations. That is what I have done in the histogram in panel C. This panel also includes the analytic posterior distribution that we calculated above – notice how well our Metropolis-Hastings algorithm did in reconstructing this distribution! For comparative methods in general, analytic posterior distributions are difficult or impossible to construct, so approximation using MCMC is very common.

This simple example glosses over some of the details of MCMC algorithms, but we will get into those details later, and there are many other books that treat this topic in great depth (e.g. Christensen et al. 2010). The point is that we can solve some of the challenges involved in Bayesian statistics using numerical “tricks” like MCMC, that exploit the power of modern computers to fit models and estimate model parameters.

## Section 2.4c: Bayes factors

Now that we know how to use data and a prior to calculate a posterior distribution, we can move to the topic of Bayesian model selection. We already learned one general method for model selection using AIC. We can also do model selection in a Bayesian framework. The simplest way is to calculate and then compare the posterior probabilities for a set of models under consideration. One can do this by calculating Bayes factors:

$$ B_{12} = frac{Pr(D|H_1)}{Pr(D|H_2)} label{2.32}$$

Bayes factors are ratios of the marginal likelihoods *P*(*D*|*H*) of two competing models. They represent the probability of the data averaged over the posterior distribution of parameter estimates. It is important to note that these marginal likelihoods are different from the likelihoods used above for *A**I**C* model comparison in an important way. With *A**I**C* and other related tests, we calculate the likelihoods for a given model and a particular set of parameter values – in the coin flipping example, the likelihood for model 2 when *p*_{H} = 0.63. By contrast, Bayes factors’ marginal likelihoods give the probability of the data averaged over all possible parameter values for a model, weighted by their prior probability.

Because of the use of marginal likelihoods, Bayes factor allows us to do model selection in a way that accounts for uncertainty in our parameter estimates – again, though, at the cost of requiring explicit prior probabilities for all model parameters. Such comparisons can be quite different from likelihood ratio tests or comparisons of *A**I**C*_{c} scores. Bayes factors represent model comparisons that integrate over all possible parameter values rather than comparing the fit of models only at the parameter values that best fit the data. In other words, *A**I**C*_{c} scores compare the fit of two models given particular estimated values for all of the parameters in each of the models. By contrast, Bayes factors make a comparison between two models that accounts for uncertainty in their parameter estimates. This will make the biggest difference when some parameters of one or both models have relatively wide uncertainty. If all parameters can be estimated with precision, results from both approaches should be similar.

Calculation of Bayes factors can be quite complicated, requiring integration across probability distributions. In the case of our coin-flipping problem, we have already done that to obtain the beta distribution in Equation
ef{2.22. We can then calculate Bayes factors to compare the fit of two competing models. Let’s compare the two models for coin flipping considered above: model 1, where *p*_{H} = 0.5, and model 2, where *p*_{H} = 0.63. Then:

$$ egin{array}{lcl} Pr(D|H_1) &=& inom{100}{63} 0.5^{0.63} (1-0.5)^{100-63} &=& 0.00270 Pr(D|H_2) &=& int_{p=0}^{1} inom{100}{63} p^{63} (1-p)^{100-63} &=& inom{100}{63} eta (38,64) &=& 0.0099 B_{12} &=& frac{0.0099}{0.00270} &=& 3.67 end{array} label{2.33}$$

In the above example, *β*(*x*, *y*) is the Beta function. Our calculations show that the Bayes factor is 3.67 in favor of model 2 compared to model 1. This is typically interpreted as substantial (but not decisive) evidence in favor of model 2. Again, we can be reasonably confident that our lizard is not a fair flipper.

In the lizard flipping example we can calculate Bayes factors exactly because we know the solution to the integral in Equation ef{2.33. However, if we don’t know how to solve this equation (a typical situation in comparative methods), we can still approximate Bayes factors from our MCMC runs. Methods to do this, including arrogance sampling and stepping stone models (Xie et al. 2011; Perrakis et al. 2014), are complex and beyond the scope of this book. However, one common method for approximating Bayes Factors involves calculating the harmonic mean of the likelihoods over the MCMC chain for each model. The ratio of these two likelihoods is then used as an approximation of the Bayes factor (Newton and Raftery 1994). Unfortunately, this method is extremely unreliable, and probably should never be used (see Neal 2008 for more details).

## Download Now!

We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with Bayesian Methods In Structural Bioinformatics Statistics For Biology And Health . To get started finding Bayesian Methods In Structural Bioinformatics Statistics For Biology And Health , you are right to find our website which has a comprehensive collection of manuals listed.

Our library is the biggest of these that have literally hundreds of thousands of different products represented.

Finally I get this ebook, thanks for all these Bayesian Methods In Structural Bioinformatics Statistics For Biology And Health I can get now!

I did not think that this would work, my best friend showed me this website, and it does! I get my most wanted eBook

wtf this great ebook for free?!

My friends are so mad that they do not know how I have all the high quality ebook which they do not!

It's very easy to get quality ebooks )

so many fake sites. this is the first one which worked! Many thanks

wtffff i do not understand this!

Just select your click then download button, and complete an offer to start downloading the ebook. If there is a survey it only takes 5 minutes, try any survey which works for you.

## Table of Contents

- Frequentist Statistics
- The Inherent Flaws in Frequentist Statistics
- Bayesian Statistics
- Conditional Probability
- Bayes Theorem

- Bayesian Inference
- Bernoulli likelihood function
- Prior Belief Distribution
- Posterior belief Distribution

- Test for Significance – Frequentist vs Bayesian
- p-value
- Confidence Intervals
- Bayes Factor
- High Density Interval (HDI)

Before we actually delve in Bayesian Statistics, let us spend a few minutes understanding *Frequentist Statistics*, the more popular version of statistics most of us come across and the inherent problems in that.

**1 Introduction to Bayesian Statistics I**

1.1 The Frequentist Approach to Statistics 1

1.2 The Bayesian Approach to Statistics 3

1.3 Comparing Likelihood and Bayesian Approaches to Statistics 6

1.4 Computational Bayesian Statistics 19

1.5 Purpose and Organization of This Book 20

**2 Monte Carlo Sampling from the Posterior 25**

2.1 Acceptance-Rejection-Sampling 27

2.2 Sampling-Importance-Resampling 33

2.3 Adaptive-Rejection-Sampling from a Log-Concave Distribution 35

2.4 Why Direct Methods Are Inefficient for High-Dimension Parameter Space 42

**3. Bayesian Inference 47**

3.1 Bayesian Inference from the Numerical Posterior 47

3.2 Bayesian Inference from Posterior Random Sample 54

**4. Bayesian Statistics Using Conjugate Priors 61**

4.1 One-Dimensional Exponential Family of Densities 61

4.2 Distributions for Count Data 62

4.3 Distributions for Waiting Times 69

4.4 Normally Distributed Observations with Known Variance 75

4.5 Normally Distributed Observations with Known Mean 78

4.6 Normally Distributed Observations with Unknown Mean and Variance 80

4.7 Multivariate Normal Observations with Known Covariance Matrix 85

4.8 Observations from Normal Linear Regression Model 87

Appendix: Proof of Poisson Process Theorem 97

5.1 Stochastic Processes 102

5.3 Time-Invariant Markov Chains with Finite State Space 104

5.4 Classification of States of a Markov Chain 109

5.5 Sampling from a Markov Chain 114

5.6 Time-Reversible Markov Chains and Detailed Balance 117

5.7 Markov Chains with Continuous State Space 120

**6. Markov Chain Monte Carlo Sampling from Posterior 127**

6.1 Metropolis-Hastings Algorithm for a Single Parameter 130

6.2 Metropolis-Hastings Algorithm for Multiple Parameters 137

6.3 Blockwise Metropolis-Hastings Algorithm 144

**7 Statistical Inference from a Markov Chain Monte Carlo Sample 159**

7.1 Mixing Properties of the Chain 160

7.2 Finding a Heavy-Tailed Matched Curvature Candidate Density 162

7.3 Obtaining An Approximate Random Sample For Inference 168

Appendix: Procedure for Finding the Matched

Curvature Candidate Density for a Multivariate Parameter 176

**8 Logistic Regression 179**

8.1 Logistic Regression Model 180

8.2 Computational Bayesian Approach to the Logistic Regression Model 184

8.3 Modelling with the Multiple Logistic Regression Model 192

**9 Poisson Regression and Proportional Hazards Model 203**

9.1 Poisson Regression Model 204

9.2 Computational Approach to Poisson Regression Model 207

9.3 The Proportional Hazards Model 214

9.4 Computational Bayesian Approach to Proportional Hazards Model 218

**10 Gibbs Sampling and Hierarchical Models 235**

10.1 Gibbs Sampling Procedure 236

10.2 The Gibbs Sampler for the Normal Distribution 237

10.3 Hierarchical Models and Gibbs Sampling 242

10.4 Modelling Related Populations with Hierarchical Models 244

Appendix: Proof That Improper Jeffrey's Prior Distribution for the Hypervariance Can Lead to an

Improper Postenor 261

## Probability and Statistics: To p or not to p?

We live in an uncertain and complex world, yet we continually have to make decisions in the present with uncertain future outcomes. Indeed, we should be on the look-out for "black swans" - low-probability high-impact events. To study, or not to study? To invest, or not to invest? To marry, or not to marry? While uncertainty makes decision-making difficult, it does at least make life exciting! If the entire future was known in advance, there would never be an element of surprise. Whether a good future or a bad future, it would be a known future. In this course we consider many useful tools to deal with uncertainty and help us to make informed (and hence better) decisions - essential skills for a lifetime of good decision-making. Key topics include quantifying uncertainty with probability, descriptive statistics, point and interval estimation of means and proportions, the basics of hypothesis testing, and a selection of multivariate applications of key terms and concepts seen throughout the course.

### Рецензии

Excellent introductory course for probability and Statistics, Dr. Abdey made the course very lively with his approach of teaching. Hope to see many more online courses from you in the future.

Hi, The course taught about the basics and that's exactly what I was looking for. I learnt a lot, and so many things which I hear and read, became clearer and more meaningful. Thanks you!

## 11 Bayesian hypothesis testing

This chapter introduces common Bayesian methods of testing what we could call *statistical hypotheses*. A statistical hypothesis is a hypothesis about a particular model parameter or a set of model parameters. Most often, such a hypothesis concerns one parameter, and the assumption in question is that this parameter takes on a specific value, or some value from a specific interval. Henceforth, we speak just of a “hypothesis” even though we mean a specific hypothesis about particular model parameters. For example, we might be interested in what we will call a *point-valued hypothesis*, stating that the value of parameter ( heta) is fixed to a specific value ( heta = heta^*) . Section 11.1 introduces different kinds of statistical hypotheses in more detail.

Given a statistical hypothesis about parameter values, we are interested in “testing” it. Strictly speaking, the term “testing” should probably be reserved for statistical decision procedures which give clear categorical judgements, such as whether to reject a hypothesis, accept it as true or to withhold judgement because no decision can be made (yet/currently). While we will encounter such categorical decision routines in this chapter, Bayesian approaches to hypotheses “testing” are first and foremost concerned, not with categorical decisions, but with quantifying evidence in favor or against the hypothesis in question. (In a second step, using Bayesian decision theory which also weighs in the utility of different policy choices, we can use Bayesian inference also for informed decision making, of course.) But instead of speaking of “Bayesian inference to weigh evidence for/against a hypothesis” we will just speak of “Bayesian hypothesis testing” for ease of parlor.

We consider two conceptually distinct approaches within Bayesian hypothesis testing.

**Estimation-based testing**considers just one model. It uses the observed data (D_ ext) to retrieve posterior beliefs (P( heta mid D_< ext >)) and checks whether, *a posteriori*, our hypothesis is credible.**Comparison-based testing**uses Bayesian model comparison, in the form of Bayes factors, to compare two models, namely one model that assumes that the hypothesis in question is true, and one model that assumes that the complement of the hypothesis is true.

The main difference between these two approaches is that estimation-based hypothesis testing is simpler (conceptually and computationally), but less informative than comparison-based hypothesis testing. In fact, comparison-based methods give a clearer picture of the quantitative evidence for/against a hypothesis because they explicitly take into account a second alternative to the hypothesis which is to be tested. As we will see in this chapter, the technical obstacles for comparison-based approaches can be overcome. For special but common use cases, like testing directional hypotheses, there are efficient methods of performing comparison-based hypothesis testing.

## Download Now!

We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with Bayesian Methods In Structural Bioinformatics Statistics For Biology And Health . To get started finding Bayesian Methods In Structural Bioinformatics Statistics For Biology And Health , you are right to find our website which has a comprehensive collection of manuals listed.

Our library is the biggest of these that have literally hundreds of thousands of different products represented.

Finally I get this ebook, thanks for all these Bayesian Methods In Structural Bioinformatics Statistics For Biology And Health I can get now!

I did not think that this would work, my best friend showed me this website, and it does! I get my most wanted eBook

wtf this great ebook for free?!

My friends are so mad that they do not know how I have all the high quality ebook which they do not!

It's very easy to get quality ebooks )

so many fake sites. this is the first one which worked! Many thanks

wtffff i do not understand this!

Just select your click then download button, and complete an offer to start downloading the ebook. If there is a survey it only takes 5 minutes, try any survey which works for you.

## 2.10 - Bayes, Empirical Bayes and Moderated Methods

Statistical methods are divided broadly into two types: frequentist (or classical) and Bayesian. Both approaches require data generated by a known randomization mechanism from some population with some unknown parameters, with the objective of using the data to determine some of the properties of the unknowns. Both approaches also require some type of model of the population - e.g. the population is Normal, or the population has a finite mean. However, in frequentist statistics, probabilities are assigned only as the frequency of an event occurring when sampling from the population. In Bayesian statistics, the information about the unknown parameters is also summarized by a probability distribution.

To understand the difference, consider tossing a coin 10 times to examine whether it is fair or not. Suppose the outcome is 7 heads. The frequentist obtains the probability of the outcome given that the coin is fair (0.12), the p-value (Prob(7 or more heads or tails|fair)=0.34) and concludes that there is no evidence that the coin is not fair. She might also produce a 95% confidence interval for the probability of a head (0.35, 0.93). For the frequentist, there is no sense in asking the probability that the coin is fair - it either is or is not fair. The frequentist makes statements about the probability of the sample after making an assumption about the population parameter (which in this case is the probability of tossing a head).

The Bayesian, by contrast, starts with general information about similar coins - on average they are fair but each possible value between zero and 1 has some probability, higher near 0.5 and quite low at 0 and 1. The probability assessment of the heads proportion is called the prior probability. Each person might have their individual assessment, based on their personal experience, which is called a subjective prior. Alternatively, a prior probability distribution might be selected based on good properties of the resulting estimates, called an objective prior. The data is then observed. For any particular heads proportion (pi) the probability of 7 heads can be computed (called the likelihood). The likelihood and the prior are then combined using Bayes' theorem in probability, to give the posterior distribution of (pi) - which gives the probability distribution of the heads proportion given the data. Since the observed number (7 heads in 10 tosses) is higher than 50%, the posterior will give higher probability than the prior to proportions greater than 1/2. The Bayesian can compute things like Prob(coin is biased towards head|7 heads in 10 tosses) although she still cannot compute Prob(coin is exactly fair| 7 heads in 10 tosses) *(*because the probability of any single value is zero*).* More information about Bayes' Theorem and Bayesian statistics, including more details about this example, can be found in [1] and [2].

In Bayesian statistics unknown quantities are given a prior distribution. When we are measuring only one feature, this is controversial - what is the "population" that a quantity like the population mean could come from? However, in high-throughput analysis, this is a natural approach. Each feature could be considered a member of a population of features. So, for example, the mean expression of gene A (in all possible biological samples) can be thought of as a sample from the mean expression of all genes. When doing differential expression analysis we could put a distribution on (mu_X-mu_Y) or on the binary outcome: the gene does/does not differentially express.

Due to the information added by the prior, Bayesian analyses tend to be more "powerful" than frequentist analyses. (I put "powerful" in quotes, because it does not mean the same thing to the Bayesian as to the frequentist due to the different formulation.) As well, Bayesians directly address questions like "What is the probability that this gene overexpresses in tumor samples" which are of interest to biologists. So, why don't we use Bayesian methods all the time?

Unfortunately, Bayesian models are quite difficult to set up except in the simplest cases. For example, Bayesian models are available to replace the t-tests we have already looked at, but more complex models including analyses like one-way ANOVA, are difficult to specify because there are multiple dependent parameters. (Recall the one-way ANOVA is an extension of the two-sample t-test to 3 or more populations. With G populations, there are G(G-1)/2 pairwise differences in means, and these differences and their dependencies would all need priors.) Another problem is that investigators with different priors would draw different inferences from the same data, which seems contrary to the idea of objective evidence based on the data. Although the influence of the prior can be shown to be overwhelmed by the data once there is sufficient data, sample sizes in many studies are too small for this to occur.

Software is available to replace t-tests with Bayesian tests for the simplest differential expression scenario. However, because this software is not extensible to more complex situations, we will not be using it in this class. For some problems, however, Bayesian methods provide powerful analysis tools.

**Empirical Bayes**

In high throughput biology we have the population of features, as well as the population of samples. For each feature we can obtain an estimate of the parameters of interest such as means, variances or differences in means. The histograms of these estimates (over all the genes) provide an estimate of the prior for the population of features called the empirical prior. This leads to a set of frequentist methods called the empirical Bayesian methods, which are more powerful than "one-feature-at-a-time" methods.

The idea with empirical Bayesian methods is to use the Bayesian set-up but to estimate the priors from the population of all features. Formally speaking, empirical Bayes are frequentist methods which produce p-values and confidence intervals. However, because we have the empirical priors, we can also use some of the probabilistic ideas from Bayesian analysis. We will be using empirical Bayes methods for differential expression analysis.

**Moderated Methods**

Empirical Bayes methods are related to another set of methods called moderated methods or James-Stein estimators. These are based on a remarkable result by James and Stein that when estimating quantities that can be expressed as expectations (which includes both population means and population variances) that when there are 3 or more populations, a weighted average of the sample estimate from the population and a quantity computed from all the populations is better on average than using just the data from the particular population. (Translated into our setting, it means that if you want to know if genes 1 through G differentially express, you should use the data from all the genes to make this determination for each gene, even though for gene i, its expression levels will be weighted more heavily.) This is called Stein's paradox. Further information can be found in [3] and as well as a brief Wikipedia article. The result is paradoxical, because it does not matter if the populations have anything in common - the result holds even if the quantities we want to estimate are mean salaries of NFL players, mean mass of galaxies, mean cost of a kilo of apples in cities of a certain type and mean air pollution indices.

Moderated methods are very intuitive for "omics" data, as we always have many more than 3 features, and since we have a population of features the result is less paradoxical. Empirical Bayes methods are moderated methods for which the weighting is generated by the empirical prior. Methods which do not fit under the empirical Bayes umbrella rely on ad hoc weights which are chosen using other statistical methods.

Empirical Bayes and moderated methods have been popularized by a number of software packages first developed for differential expression analysis of gene expression microarrays, in particular LIMMA (an empirical Bayes method), SAM (a moderated method) and MAANOVA (a moderated method).

We will start our analyses with microarray data. We will perform *t*-tests and then will use empirical Bayes *t*-tests to gain power. The more power you have for a given p-value, the smaller both your false discovery rates and your false non-discovery rates will be.

We improve power by having adequate sample size, good experimental design and a good choice of statistical methodology.

[1] López Puga J, Krzywinski M, Altman N. (2015) Points of significance: Bayes' theorem. Nat Methods. 2015 Apr12(4):277-8. PubMed PMID: 26005726.

[2] Puga, J. L., Krzywinski, M., & Altman, N. (2015). Points of significance: Bayesian Statistics. *Nature Methods*, 12(5), 377-378 doi :10.1038/nmeth.3368

[3] Efron, B. and Morris, C. (1977), “Stein’s Paradox in Statistics,” Scientific American, 236, 119-127.

## Download Now!

We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with Bayesian Methods In Structural Bioinformatics Statistics For Biology And Health . To get started finding Bayesian Methods In Structural Bioinformatics Statistics For Biology And Health , you are right to find our website which has a comprehensive collection of manuals listed.

Our library is the biggest of these that have literally hundreds of thousands of different products represented.

Finally I get this ebook, thanks for all these Bayesian Methods In Structural Bioinformatics Statistics For Biology And Health I can get now!

I did not think that this would work, my best friend showed me this website, and it does! I get my most wanted eBook

wtf this great ebook for free?!

My friends are so mad that they do not know how I have all the high quality ebook which they do not!

It's very easy to get quality ebooks )

so many fake sites. this is the first one which worked! Many thanks

wtffff i do not understand this!

Just select your click then download button, and complete an offer to start downloading the ebook. If there is a survey it only takes 5 minutes, try any survey which works for you.

## 2.13 Exercises

Generate 1,000 random 0/1 variables that model mutations occurring along a 1,000 long gene sequence. These occur independently at a rate of (10^<-4>) each. Then sum the 1,000 positions to count how many mutations in sequences of length 1,000.

Find the correct distribution for these mutation sums using a goodness of fit test and make a plot to visualize the quality of the fit.

Make a function that generates (n) random uniform numbers between (0) and (7) and returns their maximum. Execute the function for (n=25) . Repeat this procedure (B=100) times. Plot the distribution of these maxima.

What is the maximum likelihood estimate of the maximum of a sample of size 25 (call it (hat< heta>) )?

Can you find a theoretical justification and the true maximum ( heta) ?

A sequence of three nucleotides (a **codon**) taken in a coding region of a gene can be transcribed into one of 20 possible amino acids. There are (4^3=64) possible codon sequences, but only 20 amino acids. We say the **genetic code** is redundant: there are several ways to *spell* each amino acid.

The multiplicity (the number of codons that code for the same amino acid) varies from 2 to 6. The different codon-spellings of each amino acid do not occur with equal probabilities. Let’s look at the data for the standard laboratory strain of tuberculosis (H37Rv):

The codons for the amino acid proline are of the form (CC*) ⊕ (*) stands for any of the 4 letters, using the computer notation for a regular expression. , and they occur with the following frequencies in Mycobacterium turberculosis:

a) Explore the data mtb using table to tabulate the AmAcid and Codon variables.

b) How was the PerThous variable created?

c) Write an R function that you can apply to the table to find which of the amino acids shows the strongest **codon bias**, i.e., the strongest departure from uniform distribution among its possible spellings.

Display GC content in a running window along the sequence of *Staphylococcus Aureus*. Read in a fasta file sequence from a file.

a) Look at the complete staph object and then display the first three sequences in the set.

b) Find the GC content in tsequence windows of width 100.

c) Display the GC content in a sliding window as a fraction.

d) How could we visualize the overall trends of these proportions along the sequence?

b) We can compute the frequencies using the function letterFrequency .

c) Plotting the sliding window values (Figure 2.26) can be done by:

Figure 2.26: GC content along sequence 364 of the *Staphylococcus Aureus* genome.

d) We can look at the overall trends by smoothing the data using the function lowess along a window.

Figure 2.27: Smoothed GC content along sequence 364 of the *Staphylococcus Aureus* genome.

We will see later an appropriate way of deciding whether the window has an abnormally high GC content by using the idea that as we move along the sequences, we are always in one of several possible **states**. However, we don’t directly observe the state, just the sequence. Such models are called **hidden (state) Markov models**, or HMM 38 38 http://en.wikipedia.org/wiki/Hidden_Markov_model for short. The *Markov* in the name of these models is for how they model dependencies between neighboring positions, the *hidden* part indicates that the state is not directly observed, that is, hidden.

Redo a figure similar to Figure 2.17, but include two other distributions: the uniform (which is B(1,1)) and the B( (frac<1><2>,frac<1><2>) ). What do you notice?

Whereas the beta distribution with parameters larger than one are unimodal, the B(0.5,0.5) distribution is bimodal and the B(1,1) is flat and has no mode.