Bayesian quantile regression, lasso and adaptive lasso Bayesian quantile regression for immunoglobulin, prostate cancer data…

Original link: http://tecdat.cn/?p=22702

Bayesian regression quantiles have received much attention in the recent literature, and this paper implements Bayesian coefficient estimation and variable selection in regression quantiles (RQ), Bayesian with lasso and adaptive lasso penalty < strong>(Click “Read the original text” at the end of the article to get the complete code data).

Summary

Further modeling capabilities for summarizing results, plotting path plots, posterior histograms, autocorrelation plots, and plotting quantiles are also included.

Related videos

Introduction

Regression quantiles (RQ) were proposed by (Koenker and Gilbert, 1978) to model conditional quantiles of outcomes of interest as functions of predictors. Since its introduction, quantile regression has been a topic of great interest in the theoretical community and has been heavily used in many research fields, such as econometrics, marketing, medicine, ecology, and survival analysis (Neelon et al., 2015; Davino et al., 2013; Hao and Naiman, 2007). Suppose we have a sample of observations {(xi , yi);i = 1, 2, – -, n}, where yi represents the dependent variable and xi represents the k-dimensional vector of covariates.

Bayes_quantile_regression

Tobit RQ provides a method for describing the relationship between a non-negative dependent variable and a covariate vector, which can be expressed as a quantile regression model in which the data of the dependent variable is not fully observed. There is a considerable literature on the Tobit quantile regression model, we can refer to Powell (1986), Portnoy (2003), Portnoy and Lin (2010) and Kozumi and Kobayashi (2011) for an overview. Consider this model.

where yi is the observed dependent variable, y?i is the corresponding underlying unobserved dependent variable, and y0 is a known point. It can be shown that the RQ coefficient vector β can be continuously estimated by the solution of the following minimization problem

Yu and Stander (2007) proposed a Bayesian approach to Tobit RQ, using ALD to compute the error and using the Metropolis-Hastings (MH) method to draw β from its posterior distribution.

Real data example

We consider real data examples.

IgG data

This dataset includes serum concentrations of immunoglobulin G (grams per liter) in 298 children aged 6 months to 6 years and is discussed in detail by Isaacs et al. (1983) and also used by Yu et al. (2003). data set. To illustrate, a Bayesian quantile regression model for this dataset (which can be fitted below).

rq(serum concentration~age, tau=0.5)

The summary function provides estimates and 95% confidence intervals

Plot the data, then superimpose the five fitted RQ lines on the scatterplot.

R> for (i in 1:5) {
 + taus=c(0.05, 0.25, 0.5, 0.75, 0.95)
 + rq(tau=taus\[i\])
 + abline(fit, col=i)
 + }
R>
R> for (i in 1:5) {
 + fit = rq(age + I(age^2),tau=taus\[i\])
 + curve(,add=TRUE)
 + }

Figure 2: Scatterplot and RQ fit of IgG data.

Click on the title to view previous issues

matlab uses quantile random forest (QRF) regression tree to detect outliers

Swipe left and right to see more

This figure shows a scatter plot of immunoglobulin G for 298 children aged 6 months to 6 years. Superimposed on this plot are the RQ line (left panel) and RQ line (left panel) and RQ curve (right panel) for {.05, .25, .50, .75, .95}.

The graph can be used to assess the convergence of Gibbs sampling to a stationary distribution. We only report the path plot and posterior histogram for each parameter for τ = 0.50 in Figure 1. We use the following code

plot(fit,"tracehist",D=c(1,2))

Graphical summaries of the rendering results of Gibbs sampling can be made by generating path diagrams, posterior histograms, and autocorrelation diagrams. Path and Histogram, Path and Autocorrelation, Histogram and Autocorrelation, and Path, Histogram, and Autocorrelation. This function also has an option. In Figure 3, the path plot of the coefficients of the IgG data shows that the sampling jumps from one remote region of the posterior space to another with relatively few steps. Furthermore, the histogram shows that the marginal densities are actually the expected smooth univariate normal.

Figure 3: Path and density plots of the coefficients for the Immunoglobulin G dataset when τ = 0.50.

Prostate Cancer Data

In this subsection, we illustrate the performance of Bayesian quantile regression on the prostate cancer dataset (Stamey et al., 1989). This dataset investigates the relationship between prostate-specific antigen (LPSA) levels and eight covariates in patients awaiting radical prostatectomy.

These covariates are: cancer log volume (lcavol), prostate log weight (lweight), age (age), benign prostate log volume (lbph), seminal vesicle invasion (svi), and capsule penetration ( lcp), Gleason score (Gleason) and percentage of Gleason score 4 or 5 (pgg45).

In this subsection we assume that the dependent variable (lpsa) has mean zero and the predictors have been standardized to have mean zero. To illustrate, we consider the Bayesian lasso RQ(method=”BLqr”) when τ=0.50. In this case we use the following code

R> x=as.matrix(x)
R> rq(y~x, tau = 0.5, method="BLqr")

Modeling methods can be used to identify active variables in regressions.

Convergence of the corresponding Gibbs sampling was assessed by generating path plots and marginal posterior histograms of the samples. Therefore, the plot can be used to provide a graphical check on the convergence of the Gibbs sampler by examining the path plot and marginal posterior histogram using the following code.

plot(fit, type="trace")

The results of the above code are shown in Figure 4 and Figure 5, respectively. The path diagram in Figure 4 shows that the generated samples traverse the posterior space rapidly, and the histogram of the marginal posterior in Figure 5 shows that the conditional posterior distribution is actually the desired stationary univariate normal.

Wheat data

Let us consider a wheat dataset. This dataset comes from the National Wheat Planting Development Program (2017). This wheat data consists of 584 observations across 11 variables. The dependent variable is the percentage increase in wheat yield per 2500 square meters. The covariates were urea fertilizer (U), wheat seed sowing date (Ds), wheat seed sowing rate (Qs), laser flat field technique (LT), compound fertilizer application (NPK), planter technique (SMT), mung bean crop planting ( SC), crop herbicide (H), crop high potassium fertilizer (K), trace element fertilizer (ME).

The command below gives the posterior distribution of Tobit RQ for τ=0.50.

rq(y~x, tau=0.5, methods="Btqr")

Also fits Bayesian lassoTobit quantile regression and Bayesian adaptive lassoTobit quantile regression. When τ=0.50, the function can be used to obtain the posterior mean and 95% confidence interval for Tobit quantile regression.

Conclusion

In this article, we have illustrated Bayesian coefficient estimation and variable selection in quantile regression (RQ). In addition, this paper implements Bayesian Tobit quantile regression with lasso and adaptive lasso penalty. Further modeling is also included to summarize results, plot path plots, posterior histograms, autocorrelation plots, and plot quantitative plots.

References

Alhamzawi, R., K. Yu, and D. F. Benoit (2012). Bayesian adaptive lasso quantile regression. Statistical Modeling 12 (3), 279–297.

Brownlee, K. A. (1965). Statistical theory and methodology in science and engineering, Volume 150. Wiley New York.

Davino, C., M. Furno, and D. Vistocco (2013). Quantile regression: theory and applications. John Wiley & amp; Sons.

This article is an excerpt from “Bayesian Quantile Regression, Lasso and Adaptive Lasso Bayesian Quantile Regression Analysis“ , click “Read the original text” to get the complete information of the full text.

Click on the title to view previous issues

R language RSTAN MCMC: NUTS sampling algorithm uses LASSO to build a Bayesian linear regression model to analyze professional reputation data

R language STAN Bayesian linear regression model to analyze the impact of climate change on the extent of sea ice in the northern hemisphere and visualize the convergence of the model

R language Bayesian MCMC: use rstan to build a linear regression model to analyze car data and visual diagnosis

R language Bayesian MCMC: GLM logistic regression, Rstan linear regression, Metropolis Hastings and Gibbs sampling algorithm examples

R language Bayesian Poisson Poisson-normal distribution model to analyze the number of goals in professional football matches

R language uses Rcpp to accelerate Metropolis-Hastings sampling to estimate parameters of Bayesian logistic regression model

R language logistic regression, Naive Bayes Bayesian, decision tree, random forest algorithm to predict heart disease

Bayesian network (BN), dynamic Bayesian network, and linear model analysis of malocclusion data in R language

Block Gibbs Gibbs sampling Bayesian multiple linear regression in R language

Python Bayesian Regression Analysis Housing Affordability Dataset

R language implements Bayesian quantile regression, lasso and adaptive lasso Bayesian quantile regression analysis

Python implements Bayesian linear regression model with PyMC3

R language uses WinBUGS software to establish a hierarchical (hierarchical) Bayesian model for academic ability tests

Bayesian Simple Linear Regression Simulation Analysis of Gibbs Sampling in R Language

R language and STAN, JAGS: use RSTAN, RJAG to establish Bayesian multiple linear regression to predict election data

Research on the diagnostic accuracy of Bayesian hierarchical mixed model based on copula in R language

R language Bayesian linear regression and multiple linear regression to build a salary prediction model

R language Bayesian inference and MCMC: an example of implementing the Metropolis-Hastings sampling algorithm

R language stan performs regression model based on Bayesian inference

Example of RStan Bayesian hierarchical model analysis in R language

R language uses Metropolis-Hastings sampling algorithm adaptive Bayesian estimation and visualization

R language random search variable selection SSVS estimation Bayesian vector autoregressive (BVAR) model

WinBUGS for Multivariate Stochastic Volatility Model: Bayesian Estimation and Model Comparison

R language implements Metropolis–Hastings algorithm and Gibbs sampling in MCMC

R language Bayesian inference and MCMC: an example of implementing the Metropolis-Hastings sampling algorithm

R language uses Metropolis-Hastings sampling algorithm adaptive Bayesian estimation and visualization

Video: Bayesian Models for Stan Probabilistic Programming MCMC Sampling in R

R language MCMC: Bayesian estimation of Metropolis-Hastings sampling for regression

74f7bf1e33ad8e591ba3e5149021cf80. jpeg

The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. OpenCV skill tree HomepageOverview 18,000 people are studying systematically