NHANES data (complex survey data) subgroup interaction function 1.4 early adopter version (P for interaction) released — used to generate interaction effect tables with one click

In SCI articles, the interaction effect table (usually Table 5) can enhance the article, increase the credibility of the article, increase the credibility of the results, and can also perform data mining. What is a subgroup? It is usually a special type of group of people, such as men and women, race, etc. That is to say, is the result of your data still reliable when you put it into a special group of people? If your results are stable among various special groups, it means your conclusion is reliable. If the conclusion of the subgroup is contrary to the conclusion of your data, you can use it to make a new topic. You can also compare whether there are any differences between different subgroups, such as the difference between those who have had heart stents and those who have not had heart stents. Many new ideas can be discovered and data mining is easy.

I have already introduced how to make an interaction effect table in my previous article “R Language teaches you how to make an interaction effect table step by step”. You can take a look at it for details. In the previous article “scitb5 function version 1.7 (interaction effect function P for interaction) released – used to generate interaction effect tables with one click”, I released the scitb5 function version 1.7 written by me. The scitb5 function can be used for ordinary weighting data, but it cannot be used for complex weighted data, so I rewrote the subgroup interaction effect function for the NHANES data.
Let me demonstrate it below
Import a NHANES data that I downloaded myself. It should be noted here that the method of importing data is also very important (many people make mistakes here), because I program based on R’s basic data frame data.frame format, and some R packages An error will be reported if the imported data has a format (such as the haven package). A safer way is to use EXCEL to save the data in csv format first, like me, and then import it according to the method below. Or you can use as.data.frame() to force the data into a data frame. Deleting missing values here is just for demonstration convenience and has no practical significance.

library(survey)
bc<-read.csv("E:/r/test/subtext.csv",sep=',',header=TRUE)
bc <- na.omit(bc)
str(bc)

Let me introduce the data, SEQN: serial number, RIAGENDR, # Gender, RIDAGEYR, # Age, RIDRETH1, # Race , DMDMARTL, #marital status,WTINT2YR,WTMEC2YR, #weights,SDMVPSU, #psu,SDMVSTRA, #strata,LBDGLUSI, #blood sugar in mmol, LBDINSI, #insulin (pmmol/L), PHAFSTHR #postprandial blood sugar, LBXGH #glycation Hemoglobin, SPXNFEV1, #FEV1: forced expiratory volume in the first second, SPXNFVC #FVC: forced vital capacity, ml (estimated lung capacity), LBDGLTSI #2-hour postprandial blood sugar, factor.FVC is where I divide vital capacity into 2 categories for convenience for testing.
Convert categorical variables into factors

bc$DMDMARTL<-ifelse(bc$DMDMARTL==1,1,0)
bc$RIAGENDR<-as.factor(bc$RIAGENDR)
bc$RIDRETH1<-as.factor(bc$RIDRETH1)
bc$DMDMARTL<-as.factor(bc$DMDMARTL)
bc$factor.FVC<-as.factor(bc$factor.FVC)

Create a sample survey function

bcSvy2<- svydesign(ids = ~ SDMVPSU, strata = ~ SDMVSTRA, weights = ~ WTMEC2YR,
                   nest=TRUE,data = bc)

Set covariates and interaction variables. It should be noted here that cov1 refers to the covariate, Interaction is the interaction variable, and the interaction variable

cov1<-c("DMDMARTL","RIDRETH1","RIDAGEYR","LBXGH")
Interaction<-c("DMDMARTL","RIDRETH1")

Import function svyscitb5 function

source("E:/r/test/1.4svyscitb5.R")

After successful import, 7 functions should be displayed


The body of the function is

svy.scitb5 (data,x,y,Interaction,cov=NULL,time=NULL,family=NULL,method=NULL,
                     ids=NULL, strata=NULL, weights=NULL, svydstr=NULL)

Let me explain that data is data, and it must be in the form of a data frame. Covariates, in my setting, cov should include Interaction, which is also in line with our habits. Use family=”glm” uniformly, fill in the survey function here in svydstr, currently support logistic regression and linear regression, cox regression is not currently supported. If you have used the scitb5 function I wrote, this function is completely stress-free to use.

Let’s start with a categorical variable. Suppose the variable x I’m studying is race and the y variable is vital capacity. I want to understand the relationship between this layer.

out<-svy.scitb5(data=bc,x="RIAGENDR",y="factor.FVC",Interaction=Interaction,cov = cov1,family="svyglm",
                svydstr=bcSvy2) #x classification, Y classification


I want to explain the above picture. When making categorical variables, you need to set a reference. In the hierarchical comparison of DMDMARTL, RIAGENDR.1_DMDMARTL.0 is recognized as the reference. What does it mean? That is, when RIAGENDR is equal to 1 and DMDMARTL is equal to 0, this subgroup of people is defaulted to be used for reference comparison, and other groups are compared with it. When categorical variables interact with subgroups, it is best not to have too many categories, or else The data will be very large, and if the data cannot be separated into some layers, the data will show missing NA.
Generate forest map with one click

plotforest(out)


If Y is a continuous variable, the generated value here should be β. We can see that the function determines the data type by itself.

out<-svy.scitb5(data=bc,x="LBDINSI",y="SPXNFVC",Interaction=Interaction,cov = cov1,family="svyglm",
                svydstr=bcSvy2) #x is continuous, Y is continuous


Since Y is a continuous variable, it is impossible to draw a forest diagram. Because it has negative values, even if it is drawn, it will be very strange. I will not do it here.

plotforest(out)


Here’s a x-continuous, Y-classified

out<-svy.scitb5(data=bc,x="LBDINSI",y="factor.FVC",Interaction=Interaction,cov = cov1,family="svyglm"
                ,svydstr=bcSvy2) #x continuous, Y classification


Draw a forest map

plotforest(out)


Next, I will introduce the method parameter of the svy.scitb5 function. This parameter is used to calculate the P for interaction value. It has two options (“LRT”, “Wald”). LRT uses the weighted deviation method, and Wald is Wald. For testing, just use LRT for nested models, so the default is LRT. We can try the difference between the two parameters.

out<-svy.scitb5(data=bc,x="LBDINSI",y="factor.FVC",Interaction=Interaction,cov = cov1,family="svyglm",method="Wald",
                svydstr=bcSvy2) #x continuous, Y classification

out<-svy.scitb5(data=bc,x="LBDINSI",y="factor.FVC",Interaction=Interaction,cov = cov1,family="svyglm",method="LRT",
                svydstr=bcSvy2) #x continuous, Y classification



Normally you don’t need to care about this parameter or fill it in. Why do I mention this? Because sometimes the extreme values of your model data are too small and cannot be calculated using the LRT method. Then we need to change it, but we usually ignore it. Let’s use the data of a fan to demonstrate. We won’t explain the meaning of the data here.

fenxi<-read.csv("E:/r/fensi/fenxiR.csv",sep=',',header=TRUE)
fenxi$Y<- as.factor(fenxi$Y)
fenxi$Y<- as.factor(fenxi$X3)
fenxi$Y<- as.factor(fenxi$X5)
fenxi$Y<- as.factor(fenxi$X6)
fenxi$Y<- as.factor(fenxi$X7)
fenxi$Y<- as.factor(fenxi$X8)
fenxi$Y<- as.factor(fenxi$X9)
fenxi$Y<- as.factor(fenxi$X10)
fenxi$Y<- as.factor(fenxi$X11)
fenxi$Y<- as.factor(fenxi$X12)
fenxi$Y<- as.factor(fenxi$X13)
fenxi$Y<- as.factor(fenxi$X1)
fenxi$Y<- as.factor(fenxi$X16)
fenxi$Y<- as.factor(fenxi$X17)
fenxi[,c("Y","X3", "X5", "X6", "X7", "X8",
         "X9", "X10", "X11", "X12", "X13","X1","X16", "X17")] <- lapply(fenxi[,c("Y","X3", "X5", "X6", "X7", "X8",
                                                                                 "X9", "X10", "X11", "X12", "X13", "X1", "X16", "X17")], factor)
nhans<- svydesign(data=fenxi, ids=~SDMVPSU,strata = ~SDMVSTRA,
                  weights=~quanzhong,nest = TRUE)
cov1<-c("X2", "X3","X5", "X6", "X7", "X8",
        "X9", "X10", "X11", "X12", "X13", "X15", "X16", "X17")
Interaction<-c("X3","X5","X6","X7", "X8", "X9", "X10", "X11", "X12", "X13","X16"," X17")

Let’s take a look at the conventional methods

out<-svy.scitb5(fenxi,x="X",y="Y",Interaction=Interaction,cov = cov1,family="svyglm",
                svydstr=nhans)


This time we need to change the method

out<-svy.scitb5(fenxi,x="X",y="Y",Interaction=Interaction,cov = cov1,family="svyglm",method="Wald",
                svydstr=nhans)


I want to say here that these two methods are methods in the survey package. They are not generated by me. I just call them and finally draw the picture.

plotforest(out)


Finally, I would like to make an original statement that the entire code for this function was designed and written by me. You can use and modify it at will, but you must obtain my permission for commercial use.

To obtain the svy.scitb5 function, please read this article

NHANES data (complex survey data) subgroup interaction function 1.4 early adopter version (P for interaction) released-used to generate interaction effect tables with one click