Develop your own R package

This content is excerpted from the R language special section of “Gene Academy VIP Course (Season 2)”.

An R package is a collection of functions, documentation and data saved in a standard format. Packages allow us to organize our functions in a well-defined and fully documented way, and make it easy for us to share our programs with others. In essence, the R package is also a piece of software. Like python modules, perl modules, or mobile app programs, it is an extended application developed on a fixed programming language. It’s just that the R package is developed in the R language.

The R language itself is a software composed of many R packages, and more functions can also be achieved by installing extension packages. After installing R, there will be many basic R packages, such as base, dataset, graphics, stats, and utils. These packages will be automatically loaded every time you start R.

268a4a20e036c52afdfc5ce505da4748.png

Figure 1 R basic package

In this chapter we will introduce in detail how to create your own R package. For more information, please view Chapter 22 “Creating Packages” in the third edition of “R Language Practical Combat”.

1 Why develop R packages?

Why develop R packages? Here are some reasons to develop R packages.

1. Code reuse: Some repetitive methods and functions can be effectively combined;

2. Better understanding of R: By learning and developing R packages, you can better understand how R runs, which makes it easier to learn other R packages;

3. Self-improvement: Developing R is a systematic training. Through continuous thinking, you can have a comprehensive and systematic understanding of a problem. It is a very good self-improvement training;

4. Self-presentation: Provide a good way to solve problems by developing R packages and sharing code, allowing more users to use their own developed tools and show their intelligence;

5. Be famous: Developing an R package is publishing a work and increasing its popularity in the entire R world through the Internet;

6. Publish articles: R packages with complete functions, R packages that solve core problems in a certain field, or complete processes can publish SCI papers;

7. Obtain revenue: Although the R package needs to be free, you can obtain revenue by authorizing APIs, providing technical support, writing books, etc.

8. Give back to the community: Developing R packages is also a good way to give back to the entire R community and contribute to the development of the entire software.

853d6a9f859ecfcd94b897f11a2253a3.png

Figure 2 Hadley Wickham: A man who changed R (https://hadley.nz/)

2 R package structure

R packages can exist in three ways: source code, compressed package, and binary. The differences among the three ways are compared below. The compressed package is to compress the source code. Essentially, the contents of the two are the same. When installing an R package, first download the source code compressed package to a temporary directory, then compile the source code, and add the compiled binary package to the library directory.

adf06b520a33de42a17ac5d1ef8211ad.png

Figure 3 R package source code and binary

To develop an R package, you must first understand the structure of the R package. R packages generally have a fixed format, which is the framework structure. It is mainly divided into several parts, functions, documents, data and description information.

6965db4d0182647fd6849229c6d77733.png

Figure 4 Typical R package directory structure

R packages can be divided into two formats: source code and final release. The source code belongs to the initial stage, and the documents need to be converted for final release. The picture above is the directory structure of a typical R package. The content of each part is introduced below.

DESCRIPTION file: Description file, no handwriting required, contains information about the package, such as package name, version, author, description, etc. Information will be automatically added to the file each time it is updated;

NAMESPACE file: Namespace. This file controls the visibility of functions. Which functions can be directly obtained by users of the package. The NAMESPACE file also lists which packages it depends on.

R directory: R function code is the core part of the entire package. Some will directly provide the source code, and some will be compiled into .rdx format.

man directory:manual, help document, written in Rd format, corresponding to each function in the R directory and data directory. Each function or data set should have a corresponding help file.

data directory: data directory, optional, put some case data, rda format, or rds format.

inst directory: optional to contain any other files, such as sample scripts, additional data, documentation, etc.;

extdata: Original file data directory, for example, provide a csv file and place it in this directory.

tests directory: Optional, this directory contains code for testing your package.

vignettes directory: optional, vignettes format document;

html directory: web page format document.

.gitignore file: a file that stores git information and will be ignored

.Rbuildignore file: As the name suggests, it is a file that will be ignored when building the package and will be ignored.

.Rproj file: RStudio project file. When we edit the R package in the future, we will edit it by opening it, because this file contains the menu for creating R packages, which is important.

3 Preparation

After you are familiar with the R package structure, you can start preparing the development package. First create the skeleton structure required by the R package, and then fill it with content. The main content is the R function and the corresponding documentation. However, the corresponding configuration files need to be modified after each update. Rstudio currently provides a more convenient set of R package development tools. This set of tools can greatly simplify the development of R packages.

RStudio: R integrated development environment IDE;

devtools: R development tool kit, which includes usethis, roxygen2 and other tools;

usethis package: generate R package skeleton structure;

roxygen2 package: used to generate help documents and convert R documents into various help documents;

tinytex: generate pdf help documentation;

testthat: test

#Install and load the required packages for the development package
install.packages("usethis", "devtools", "roxygen2")
library(usethis)
library(devtools)
library(roxygen2)


#View version
packageVersion("devtools")
#examine
has_devel()
Your system is ready to build packages!

a135092f1f7868fac079c0ce4e06205a.png

Figure 5 Developing R package books

https://r-pkgs.org/setup.html

4Use git

The R package ultimately needs to be submitted to CRAN so that it is easy for users to download and use it. However, submitting to CRAN is troublesome. During the development stage, you can submit it to github first, and then submit it to CRAN when the functions are more complete. And using github is mainly to facilitate version control. Many R package developers will host the source code on github and submit the binary package to CRAN. rstudio integrates the git function, and you only need to install the git tool separately to use it.

https://geo.uzh.ch/microsite/reproducible_research/post/rr-rstudio-git/

1. Register a git account

https://github.com/

2. Install the git program

https://git-scm.com/downloads

3. rstudio configure git

7223b978b5c5b1b3578cc7d8a79eff9d.png

Figure 6 Configuring git

4. Create a key

b20ddf94acbdf85d882de8b698fb2d96.png

Figure 7 Creating a key

5. Add the public key to github

e63887c9cb9018d205abf2429a43402d.png

Figure 8 added to github

6. Add account

git config --global user.email "[email protected]
git config --global user.name "yourname"
ssh -T [email protected]

7. Connect projects

git config remote.origin.url
[email protected]:wangtong/Rtest2.git
git push --set-upstream origin master

fefa2c4e6cba4702f553ec8118f70923.png

Figure 9 Connecting local projects to github

5 Develop R package

1. Create R package skeleton

It can be created directly through rstudio or using functions.

3fca53038681b8c5346f7f03aea8f933.png

Figure 10 Create R package development project

#Create R package project
library(devtools)
create_package("edatools")

2. Add function

use_r("contents")
#Add function code in R/contients.R
contents <- function(data){
if(!(is.data.frame(data))){
stop("You need to
input a data frame")
}
dataname <-
deparse(substitute(data))
size <- object.size(data)
# overall summary
--------------------------
varnames <-
colnames(data)
colnames <-
c("pos", "variable", "type",
"n_unique",
         "n_miss",
"pct_miss")
pos = seq_along(data)
varname <- colnames(data)
type = sapply(data,
function(x)class(x)[1])
n_unique = sapply(data,
function(x)length(unique(x)))
n_miss = sapply(data,
function(x)sum(is.na(x)))
pct_miss = n_miss/nrow(data)
varinfo <- data.frame(
pos, varname, type,
n_unique, n_miss, pct_miss
)
results <-
list(dfname=dataname, size=size,
nrow=nrow(data), ncol=ncol(data),
varinfo=varinfo)
class(results) <-
c("contents")
return(results)
}

Loading functions, using the load_all() function, can load the code directly into memory, so that the function can be used directly.

#Loading function
load_all()
#Test function, which can directly output data frame information
contents(mtcars)

3. Add documents

R’s documents are mainly in PDF or html format. Since writing Latex directly is too complicated, R provides a simple tag called “roclet”. Each tag starts with “#'”, and then converts it to Latex through the Roxygen2 package. Format, with the extension .Rd in R, will eventually be converted into the required pdf or web page format.

In the R package source code, documentation information should be written together with the functions.

fa13f55d8d9c016dc2eba945e235c978.png

Figure 11 Tags used by roxygen2

Move the cursor to the head of the function, then select the rstudio code menu, select “Insert Roxygen Skeleton” from it to generate the document skeleton, and then fill in the corresponding information.

d5ac2048ecdf4629d94298136b632b22.png

Figure 12 Generate annotation content skeleton

#' @title Description of a data frame
#' @description
#' \code{contents} provides describes the contents of a data
#' frame.
#' @param data a data frame.
#' @importFrom utils object.size
#' @return a list with 4 components:
#' \describe{
#' \item{dfname}{name of data frame}
#' \item{nrow}{number of rows}
#' \item{ncol}{number of columns}
#' \item{size}{size of the data frame in bytes}
#' \item{varinfo}{data frame of overall dataset characteristics}
#' }
#'
#' @details
#' For each variable in a data frame, \code{contents} describes
#' the position, type, number of unique values, number of
missing
#' values, and percent of missing values.
#'
#' @export
#'
#' @examples
#' df_info <- contents(happiness)
#' df_info
#' plot(df_info)
In this way, the contents() function is completed.
4. Add data
You can save variables in the environment as R case data, but you also need to add documentation to the data set in the R package. Add a function with the same name to the R directory for the dataset. No documentation is required if the original files are provided directly.
#Read data
happiness <- readr::read_csv("happiness.csv")
#Generate data set
use_data(happiness)
Add documentation to the dataset.
#' @title
#' Happiness Dataset
#'
#' @description
#' A data frame containing a happiness survey and demographic
data.
#' This data is fictitious.
#'
#' @source
#' The data were randomly generated using functions from the
#'
\href{https://cran.r-project.org/web/packages/wakefield/index.html}{wakefield}
#' package.
#'
#' @format A data frame with 460 rows and 11 variables:
#' \describe{
#'\item{\code{ID}}{character. A unique identifier.}
#'\item{\code{Date}}{date. Date of the interview.}
#'\item{\code{Sex}}{factor. Sex coded as \code{Male} or \code{Female}.}
#'\item{\code{Race}}{factor. Race coded as an 8-level factor.}
#'\item{\code{Age}}{integer. Age in years.}
#'\item{\code{Education}}{factor. Education coded as a 13-level factor.}
#'\item{\code{Income}}{double. Annual income in US dollars.}
#'\item{\code{IQ}}{double. Adult intelligence quotient. This
#' variable has a large amount of missing data.}
#'\item{\code{Zip}}{character. USPS Zip code.}
#'\item{\code{Children}}{integer. Number of children.}
#'\item{\code{Happy}}{factor. Agreement with the statement
#' "I am happy most
of the time", coded as \code{Strongly Disagree},
#' \code{Disagree},
\code{Neutral}, \code{Agree}, or
#' \code{Strongly
Agree}.}
#' }
"happiness"

5. Check the content

You can use the check() command to check the package contents.

check()

6. Add documents

If there is no problem, you can generate documents through the document() function. This process uses Roxygen2 to convert the function comment part in the R directory into a Latex document.

document()
?Updating edatools documentation
First time using roxygen2. Upgrading automatically...
Setting `RoxygenNote` to "7.2.3"
? Loading edatools
Writing contents.Rd
Writing happiness.Rd

If you want to generate a vignettes format document, you need to use the use_vignette() function.

#Generate vignettes document
use_vignette("happiness")

If you need help with the documentation file, you can create a function with the same name as the package and generate documentation, so that when you call the help() function to view package information, you can get help information.

#Create function
use_r("edatools")
#Add the following help information to the function
#' Functions for exploring the contents of a data frame.
#'
#' edatools provides tools for exploring the variables in
#' a data frame.
#'
#' @docType package
#' @name edatools-package
#' @aliases edatools
NULL

7. The same process is used to generate plot.R and print.R functions and convert them into corresponding documents.

use_r("plot")
use_r("print")
document()

8. Edit metadata

After the functions, documents, data information, etc. are processed, some metafile information needs to be changed, such as DESCRIPTION, readme, NAMESPACE and other information.

#Update license information use_mit_license()

This function will add a line of license information to the DESCIPTION file;

#Add package dependency information
use_package("ggplot2")

This function will add package dependency information to the DESCRIPTION file.

In addition, you also need to think about the editing version, author, description, etc.

If you want to host the code on github, you also need to generate a readme file.

#Generate readme file
use_readme_md()

9. Generate source code

R packages can be packaged into source code and distributed to others, or submitted to CRAN.

70ff98eb1663cd560b3cc634fce3ad57.png

Figure 13 Generate source code

10. Installation package

After finally completing all document editing and detecting no problems, you can use it for installation.

#check
check()
#Install
install()

35d7f6b46e37a997d673d4bc1f26004d.png

Figure 14 Paid video tutorial

syntaxbug.com © 2021 All Rights Reserved.