R language practice – using rWCVP to generate custom checklists

Using rWCVP to generate a custom manifest

  • introduce
  • 1. List of endemic species
  • 2. List of near-endemic species
    • 2.1 Species occurring in Sierra Leone and another area
    • 2.2 Species occurring in Sierra Leone and adjacent areas
  • 3. Generate custom reports

Introduction

In addition to allowing users to create inventories from the World Catalog of Vascular Plants (WCVP), rWCVP also provides functionality to modify the inventory output to generate custom reports. This paper demonstrates this by generating a list of endemic (or near-endemic) species to Sierra Leone.

In addition to rWCVP, we will use the tidyverse package for data manipulation and plotting, the gt package for rendering nice tables, and the sdpep package for finding boundary regions.

Be prepared first

library(rWCVP)
library(tidyverse)
library(gt)
library(spdep)

In this example, the == pipe operator (%>%) and the dplyr syntax ==- are used. If you are not familiar with these, I recommend checking out https://dplyr.tidyverse.org/ and some of the help pages there.

1. List of endemic species

We first generate a Sierra species list. Remembering and/or finding the appropriate WGSRPD level 3 codes is a pain, so we can use get_wgsrpd3_codes(“Sierra Leone”) in a function call to do this work for us.

sl_code = get_wgsrpd3_codes("Sierra Leone")
sl_species = wcvp_checklist(area=sl_code, synonyms = FALSE)

synonyms = FALSE means that only accepted names are kept. The result was nearly 100,000 entries.

How many species are there in Sierra Leone and how many are endemic? We can select unique columns here, very simple.

(endemic_summary = sl_species %>%
    distinct(taxon_name, endemic) %>%
    group_by(endemic) %>%
    summarise(number.of.sp=n()))
# A tibble: 2 × 2
  endemic number.of.sp
  <lgl> <int>
1 FALSE 3303
2 TRUE 45

Easy! For the list of endemic species, we can simply use the endemic column to filter our list, but what about near-endemic species?

2. List of near-endemic species

Based on our definition of near-endemic species, there are two approaches to screening:

  1. We defined near-endemic species as those occurring in Sierra Leone and another WGSPRD3 region (L3). From a data point of view, this means filtering out species that have >2 rows in sl_species (since each row is a species region occurrence).
  2. Alternatively, we could consider near-endemic species, as those that may straddle boundaries and thus be functionally endemic. To do this, we need to a) identify adjacent WGSPRD3 regions and b) filter our species list accordingly.

2.1 Species occurring in Sierra Leone and another area

We simply remove species with more than 2 distributions from the list of species in Sierra Leone.

sl_near_endemics1 = sl_species %>%
  group_by(plant_name_id, taxon_name) %>%
  filter(n() < 3) %>%
  ungroup()


There are 179 entries in the result, but there are repetitions (that is, due to the definition of near-endemic species), after the summary:

sl_near1_summary = sl_near_endemics1 %>%
  group_by(taxon_name) %>%
  summarise(number.of.sp=n())

There are 112 near-endemic species, and it is also possible to judge which are newly added and which are original endemic species based on the numbers.

2.2 Species occurring in Sierra Leone and adjacent areas

First, we need to identify which WGSRPD areas border Sierra Leone.
We can do this by looking at the map, but we will manipulate the layer file programmatically. For this, we take WGSRPD level 3 polygons and find all regions that border each other.

sf_use_s2(FALSE)

area_polygans = rWCVPdata::wgsrpd3
area_neighbors = poly2nb(area_polygans)

Note that we have to turn off spherical coordinates in sf with sf_use_s2(FALSE) for this.

Now that we have a list of neighboring regions, we need to find the regions that border Sierra Leone.

sl_index = which(area_polygans$LEVEL3_COD %in% sl_code)
sl_neighbors_index = area_neighbors[[sl_index]]
sl_plus_neighbors = area_polygans[c(sl_index, sl_neighbors_index),]


Check the sanity of this automatic neighbor detection before generating the final inventory. Areas can then be mapped into a map

bounding_box = st_bbox(sl_plus_neighbors)
xmin = bounding_box["xmin"] - 2
xmax = bounding_box["xmax"] + 2
ymin = bounding_box["ymin"] - 2
ymax = bounding_box["ymax"] + 2

ggplot(area_polygans) +
  geom_sf(fill="white", colour="grey") +
  geom_sf(data=sl_plus_neighbors, fill="#a4dba2", colour="gray20") +
  coord_sf(xlim = c(xmin, xmax), ylim=c(ymin, ymax)) +
  geom_sf_label(data=sl_plus_neighbours, aes(label=sl_plus_neighbours$LEVEL3_NAM)) +
  theme(panel. background = element_rect(fill = "#bfbadb")) +
  xlab(NULL) +
  ylab(NULL)


Of course, we could identify Guinea and Liberia as neighboring countries from the map, and then use get_wgsrpd3_codes(“Liberia”) and get_wgsrpd3_codes(“Guinea”) to find the codes, but that’s not that interesting!

Next, we can identify near-endemic species as those that occur only in Sierra Leone, Guinea or Liberia.

sl_near_endemics2 = sl_species %>%
  group_by(plant_name_id) %>%
  filter(all(area_code_l3 %in% sl_plus_neighbors$LEVEL3_COD)) %>%
  ungroup()


Finally, we filtered the list to only species that occur in Sierra Leone + a neighboring country, as we did in option 1. From the map, it seems likely that a species would occur at a triple junction between three countries, but for this example we will exclude those species.

sl_near_endemics2 = sl_near_endemics2 %>%
  group_by(plant_name_id, taxon_name) %>%
  filter(n() < 3) %>%
  ungroup()

Also, remove duplicates to see how many near-endemic species there are actually:

sl_near2_summary = sl_near_endemics2 %>%
  group_by(taxon_name) %>%
  summarise(number.of.sp=n())

3. Generate custom reports

Now we can do something fancy – turn our inventory data frame into a formatted report. To do this, we insert it into a template file called “custom_checklist.Rmd”, which is stored in the rWCVP package folder (“rmd” subfolder). We pass data (along with some other information) using parameters, and need to specify a filename using output_file.

library(rmarkdown)

checklist_description = "Checklist of species that are endemic to Sierra Leone (or near-endemic, based on neighboring countries)"
wd = getwd()
render(system.file("rmd", "custom_checklist.Rmd", package = "rWCVP"),
       quiet = TRUE,
       params=list(version = "New Phytologist Special Issue",
                    mydata = sl_near_endemics2,
                    description = checklist_description),
       output_file = paste0(wd,"/Sierra_Leone_endemics_and_near_endemics.html"))

This step reports an error, I have no ability to solve it

Take a look at the result that should appear: