Bayesian quantile regression, lasso and adaptive lasso Bayesian quantile regression analysis of immunoglobulin and prostate cancer data…

Original link: http://tecdat.cn/?p=22702

Bayesian regression quantiles have received widespread attention in the recent literature. This paper implements Bayesian coefficient estimation and variable selection in regression quantiles (RQ), Bayesian with lasso and adaptive lasso penalty< strong>(Click “Read the original text” at the end of the article to get the completecode data).

Abstract

Further modeling capabilities for summarizing results, plotting path plots, posterior histograms, autocorrelation plots and plotting quantile plots are also included.

Related videos

Introduction

Regression quantile (RQ), proposed by (Koenker and Gilbert, 1978), models the conditional quantile of the outcome of interest as a function of predictors. Since its introduction, quantile regression has been a topic of great concern in the theoretical community and has also been widely used in many research fields, such as econometrics, marketing, medicine, ecology, and survival analysis (Neelon et al., 2015; Davino et al., 2013; Hao and Naiman, 2007). Suppose we have an observation sample {(xi , yi);i = 1, 2, – -, n}, where yi represents the dependent variable and xi represents the k-dimensional vector of covariates.

Bayesian_quantile_regression

Tobit RQ provides a method for describing the relationship between a non-negative dependent variable and a vector of covariates, which can be expressed as a quantile regression model in which the data of the dependent variable is not fully observed. There is considerable literature on Tobit quantile regression models, and we can refer to Powell (1986), Portnoy (2003), Portnoy and Lin (2010), and Kozumi and Kobayashi (2011) for an overview. Consider this model.

Among them, yi is the observed dependent variable, y?i is the corresponding potential unobserved dependent variable, and y 0 is a known point. It can be shown that the RQ coefficient vector β can be continuously estimated by the solution of the following minimization problem

Yu and Stander (2007) proposed a Bayesian method for Tobit RQ, using ALD to calculate the error and using the Metropolis-Hastings (MH) method to draw β from its posterior distribution.

Real data examples

Let’s consider using real data examples.

Immunoglobulin G data

This data set includes serum concentrations of immunoglobulin G (g/L) in 298 children aged 6 months to 6 years and is discussed in detail by Isaacs et al. (1983) and also used by Yu et al. (2003) data set. To illustrate, a Bayesian quantile regression model for this data set (can be fitted as follows).

rq(serum concentration~age, tau=0.5)

Summary function provides estimates and 95% confidence intervals

Plot the data and then overlay the five fitted RQ lines on the scatter plot.

R> for (i in 1:5) {
 + taus=c(0.05, 0.25, 0.5, 0.75, 0.95)
 + rq(tau=taus\[i\])
 + abline(fit, col=i)
 + }
R>
R> for (i in 1:5) {
 + fit = rq(age + I(age^2),tau=taus\[i\])
 + curve(,add=TRUE)
 + }

Figure 2: Scatterplot and RQ fit of immunoglobulin G data.

Click on the title to view previous issues

matlab uses quantile random forest (QRF) regression tree to detect outliers

Swipe left or right to see more

This figure shows a scatter plot of immunoglobulin G in 298 children aged 6 months to 6 years. Superimposed on this graph are the RQ line (left graph) and RQ line (left graph) and RQ curve (right graph) of {.05, .25, .50, .75, .95}.

The plot can be used to evaluate the convergence of Gibbs sampling to a stationary distribution. We only report the path plot and posterior histogram of each parameter for τ = 0.50 in Figure 1. We use the following code

plot(fit,"tracehist",D=c(1,2))

The drawing results of Gibbs sampling can be graphically summarized by generating path plots, posterior histograms, and autocorrelation plots. Paths and histograms, paths and autocorrelation, histograms and autocorrelation, and paths, histograms and autocorrelation. This function also has an option. In Figure 3, the path plot of the immunoglobulin G data coefficients shows that there are relatively few steps in which sampling jumps from one remote region of the posterior space to another. Furthermore, the histogram shows that the marginal density is actually the stationary univariate norm as expected.

Figure 3: Path and density plots of coefficients for the immunoglobulin G data set when τ = 0.50.

Prostate cancer data

In this subsection, we illustrate the performance of Bayesian quantile regression on the prostate cancer data set (Stamey et al., 1989). This data set investigated the relationship between prostate-specific antigen (LPSA) levels and eight covariates in patients awaiting radical prostatectomy.

These covariates are: log volume of cancer (lcavol), log weight of prostate (lweight), age (age), log volume of benign prostate (lbph), seminal vesicle invasion (svi), log capsule penetration ( lcp), Gleason score (gleason), and percentage of Gleason score 4 or 5 (pgg45).

In this subsection, we assume that the dependent variable (lpsa) has a mean of zero, while the predictors have been standardized to have a mean of zero. To illustrate, we consider the Bayesian lasso RQ (method=”BLqr”) when τ=0.50. In this case we use the following code

R> x=as.matrix(x)
R> rq(y~x,tau = 0.5, method="BLqr")

Modeling methods can be used to identify active variables in regression.

The convergence of the corresponding Gibbs sampling is evaluated by generating path plots and marginal posterior histograms of the samples. Therefore, the plot can be used to provide a graphical check on the convergence of the Gibbs sampler, by examining the path plot and marginal posterior histogram using the following code.

plot(fit, type="trace")

The results of the above code are shown in Figure 4 and Figure 5 respectively. The path plot in Figure 4 shows that the generated samples rapidly traverse the posterior space, and the marginal posterior histogram in Figure 5 shows that the conditional posterior distribution is in fact the desired stationary univariate normality.

Wheat data

Let’s consider a wheat dataset. This data set comes from the National Wheat Cultivation Development Plan (2017). This wheat data consists of 584 observations of 11 variables. The dependent variable is the percentage increase in wheat yield per 2500 m2. The covariates are chemical fertilizer urea (U), wheat seed sowing date (Ds), wheat seed sowing rate (Qs), laser leveling technology (LT), compound fertilizer fertilization (NPK), seeder technology (SMT), mung bean crop planting ( SC), crop herbicide (H), crop high potassium fertilizer (K), trace element fertilizer (ME).

The following command gives the posterior distribution of Tobit RQ when τ=0.50.

rq(y~x,tau=0.5, methods="Btqr")

You can also fit Bayesian lassoTobit quantile regression and Bayesian adaptive lassoTobit quantile regression. When τ=0.50, the function can be used to obtain the posterior mean and 95% confidence interval of Tobit quantile regression.

Conclusion

In this article, we have illustrated Bayes coefficient estimation and variable selection in quantile regression (RQ). In addition, this article also implements Bayesian Tobit quantile regression with lasso and adaptive lasso penalty. Further modeling including summarizing results, plotting path plots, posterior histograms, autocorrelation plots and plotting quantitative plots is also included.

References

Alhamzawi, R., K. Yu, and D. F. Benoit (2012). Bayesian adaptive lasso quantile regression. Statistical Modelling 12 (3), 279–297.

Brownlee, K. A. (1965). Statistical theory and methodology in science and engineering, Volume 150. Wiley New York.

Davino, C., M. Furno, and D. Vistocco (2013). Quantile regression: theory and applications. John Wiley & Sons.

Excerpts from this article“R language implements Bayesian quantile regression, lasso and adaptive lasso Bayesian quantile regression analysis“ , click “Read original text” to get the full text.

Click on the title to view previous issues

R language RSTAN MCMC: NUTS sampling algorithm uses LASSO to build a Bayesian linear regression model to analyze professional reputation data

R language STAN Bayesian linear regression model analyzes the impact of climate change on northern hemisphere sea ice extent and visually checks model convergence

Bayesian MCMC in R language: Use rstan to build a linear regression model to analyze car data and visual diagnosis

Bayesian MCMC in R language: GLM logistic regression, Rstan linear regression, Metropolis Hastings and Gibbs sampling algorithm examples

R language Bayesian Poisson Poisson-normal distribution model analyzes the number of goals scored in professional football matches

R language uses Rcpp to accelerate Metropolis-Hastings sampling estimation of parameters of Bayesian logistic regression model

R language logistic regression, Naive Bayes, decision tree, random forest algorithm to predict heart disease

Bayesian network (BN), dynamic Bayesian network, and linear model in R language analyze malocclusion data

Block Gibbs Gibbs sampling Bayesian multiple linear regression in R language

Python Bayesian Regression Analysis of Housing Affordability Dataset

R language implements Bayesian quantile regression, lasso and adaptive lasso Bayesian quantile regression analysis

Python uses PyMC3 to implement Bayesian linear regression model

R language uses WinBUGS software to build a hierarchical (hierarchical) Bayesian model for academic ability tests

Bayesian simple linear regression simulation analysis using Gibbs sampling in R language

R language and STAN, JAGS: Use RSTAN, RJAG to build Bayesian multiple linear regression to predict election data

Research on diagnostic accuracy of Bayesian hierarchical mixture model based on copula in R language

R language Bayesian linear regression and multiple linear regression to build a salary prediction model

Bayesian inference and MCMC in R language: Example of implementing Metropolis-Hastings sampling algorithm

R language stan performs regression model based on Bayesian inference

Example of RStan Bayesian hierarchical model analysis in R language

R language uses Metropolis-Hastings sampling algorithm for adaptive Bayesian estimation and visualization

R language stochastic search variable selection SSVS estimates Bayesian vector autoregressive (BVAR) models

WinBUGS for multivariate stochastic volatility models: Bayesian estimation and model comparison

R language implements Metropolis–Hastings algorithm and Gibbs sampling in MCMC

Bayesian inference and MCMC in R language: Example of implementing Metropolis-Hastings sampling algorithm

R language uses Metropolis-Hastings sampling algorithm for adaptive Bayesian estimation and visualization

Video: Bayesian Model of MCMC Sampling with Stan Probabilistic Programming in R

MCMC in R: Metropolis-Hastings sampling for Bayesian estimation of regression