California Halibut (Paralichthys californicus) Sitka, AK, USA. by mikecarr via iNaturalist, CC BY 4.0

June 10, 2024

Querying OBIS, GBIF, and Fishbase with AphiaIDs

Querying OBIS, GBIF, and Fishbase with AphiaIDs

Querying OBIS, GBIF, and Fishbase with AphiaIDs

Author

Stephen Formel

What?

I recently attended a meeting to discuss Species Distribution Models, or SDMs. The attendees emphasized the importance of GBIF, OBIS, and FishBase for informing SDMs. One of the challenges that was discussed was how to query taxa consistently and efficiently across these databases.

Here I demonstrate how to use AphiaIDs, the identifier used by the World Register of Marine Species (WoRMS), to query and link the data from OBIS, GBIF, and FishBase. All of the code below is written in R. And a big thank you to Yi-Ming Gan, from the SCAR Antarctic Biodiversity Portal (hosted by Institute of Natural Sciences), for helping me connect the dots!

The basic idea:

  1. Query WoRMS to identify your AphiaID(s).
  2. Query WoRMS to get the equivalent Fishbase ID(s).
  3. Query GBIF to get the equivalent identifier(s) from the GBIF taxonomic backbone.
  4. Query OBIS, GBIF, and Fishbase for your taxa of interest.

Why?

The data needed to answer a question can be distributed across multiple repositories. Often, the first step to answering a question is to coordinate the data. All of the aforementioned databases are working to resolve this challenge, but it will probably be some time before science completely figures out how to seamlessly serve all data all the time!

How?

We’ll look at mahi-mahi, AKA Dorado, (Coryphaena hippurus) in the EEZ of Puerto Rico.

By Unknown author - NOAA FishWatch, Public Domain, https://commons.wikimedia.org/w/index.php?curid=22626272

Get your R packages set up

You’ll need the packages listed in the code block below to be installed if you want to complete the analysis yourself. In some cases, I’ve pointed out which package a function comes from by calling it explicitly. For example, worrms::wm_name2id is calling the wm_name2id package from the worrms package. I only did this to make it clear where the function is coming from. You can also load these packages with the library function, like I show in the code block below.

Note that while most of the packages can be installed with the install.packages() function, the mregions2 package needs to be installed from Github. See here for instructions.

library(dplyr)
library(mregions2)
library(sf)
library(wk)
library(worrms)
library(httr2)
library(tidyr)
library(ggplot2)
library(rnaturalearth)
library(rfishbase)
library(robis)
library(rgbif)
library(stringr)
library(ggtext)
Step 1. Query WoRMS to identify your AphiaID(s).

You can search WoRMS through your web browser, using scientific or common names. Here is the landing page for Coryphaena hippurus:

https://www.marinespecies.org/aphia.php?p=taxdetails&id=126846

You can see the AphiaID at the end of the link: 126846. There is also an R package, worrms that can help execute this query:

AphiaID <- worrms::wm_name2id(name = "Coryphaena hippurus")

Note that we put the AphiaID into an object named AphiaID that we will use below.

Step 2. Query WoRMS to get the equivalent Fishbase ID(s).

Here I show two ways to query the WoRMS API for the fishbase identifier. The first is via the httr2 package The second is via the worrms package. If you’re not familiar with APIs and how they work, John Waller at GBIF wrote a blog post that’s a good place to start.

library(httr2)

fishbaseID <- request(base_url = 'https://www.marinespecies.org/rest/AphiaExternalIDByAphiaID/') %>% 
  req_url_path_append(AphiaID) %>%
  req_url_query(`type` = 'fishbase') %>% 
  req_perform() %>%
  resp_body_json() %>%
  unlist()

To do the same thing with the worrms package:

library(worrms)

fishbaseID <- wm_external(id = AphiaID, type = "fishbase")
Step 3. Query GBIF to get the equivalent identifier(s) from GBIF

GBIF aligns the data to a custom taxonomic database known as the GBIF taxonomic backbone. Although most AphiaIDs map to the backbone, I’m not 100% positive that all of them do. This dissonance is a known problem in the world of taxonomic databases and smart people from all over the world are working on removing this challenge.

If you only have one species, or aren’t feeling very code savvy, you can also view this information by putting the API call into a web browser:

https://api.gbif.org/v1/species?datasetKey=2d59e5db-57ad-41ff-97d6-11f5fb264527&sourceId=urn:lsid:marinespecies.org:taxname:126846

But if you are feeling code savvy:

library(httr2)

sourceId <- paste0('urn:lsid:marinespecies.org:taxname:', AphiaID)

response <- request(base_url = 'https://api.gbif.org/v1/species') %>% 
  req_url_query(`datasetKey` = '2d59e5db-57ad-41ff-97d6-11f5fb264527', 
                `sourceId` = 'urn:lsid:marinespecies.org:taxname:159222') %>% 
  req_perform() %>%
  resp_body_json()

GBIF_backboneID <- response$results[[1]]$nubKey
Step 4. Query OBIS, GBIF, and FishBase for your taxa of interest.
OBIS

Geometry

Before we can execute our query, we need to create the geometry for the area we wish to search. This is important because without a geographic shape (AKA geometry) we’d be searching for Mahi anywhere on the planet!

We’ll grab an outline shape of the Puerto Rico EEZ from MarineRegions.org using the package mregions2 and turn it into a convex hull to simplify the geometry being queried. Then we’ll turn it into a text string in a standard format, Well-Known Text (WKT), so it can be used in the API queries.

#Make convex hull of PR EEZ from MarineRegions.org.
PR_EEZ <- mregions2::gaz_geometry(x = 33179) %>% 
  sf::st_convex_hull() %>% 
  sf::st_as_text() %>% 
  wk::wkt() %>% 
  wk::wk_orient()  

Note: The GBIF API reads clockwise WKT as a hole in geometry. But the sf package outputs clockwise WKT by default. The functions from the wk package are the easiest way I’ve found to reverse the winding order the WKT.

Let’s start with OBIS, since it aligns to WoRMS and therefore uses AphiaIDs.

library(robis)

obis_results <- robis::occurrence(taxonid = AphiaID, 
                  geometry = PR_EEZ)

Let’s take a moment to create the citations for the four datasets that contributed to this query, and for OBIS. We will publish these citatoins at the bottom of this post. We’ll follow the guidance in the OBIS Manual:

OBIS_metadata <- obis_results$dataset_id %>% unique() %>% 
  robis::dataset(datasetid = .)

#generate citations

OBIS_citations <- list()

for(i in 1:nrow(OBIS_metadata)){
  
OBIS_citations[[i]] <- robis::generate_citation(title = OBIS_metadata[i,]$title,
                         published = OBIS_metadata[i,]$published,
                         url = OBIS_metadata[i,]$url,
                         contacts = OBIS_metadata[i,]$contacts %>% as.data.frame())
  
} %>% unlist()

# Make citation for OBIS itself:

date_accessed <- Sys.Date()
query_title <- "Occurrence records of Coryphaena hippurus (Linnaeus, 1758) in the Puerto Rico EEZ "

OBIS_citation <- paste0("OBIS (2024) ", 
                       query_title, 
                       '(Available: Ocean Biodiversity Information System. Intergovernmental Oceanographic Commission of UNESCO. www.obis.org. Accessed:', 
                       date_accessed, 
                       ')'
                       )
Then GBIF…

GBIF data can be accessed through the API without obtaining a DOI, but the best practice is to mint a DOI once you are convinced that your query is final. If you continue to filter your results after downloading, you can always mint a derived dataset DOI.

library(rgbif)

# download without DOI, to explore the data.
gbif_results <- rgbif::occ_data(taxonKey = GBIF_backboneID,
                                geometry = PR_EEZ) %>%
  .[["data"]]
#download with DOI, so I can cite the data.

rgbif::occ_download(
  user = 'sformel',
  email = 'sformel@usgs.gov',
  pwd = rstudioapi::askForPassword(prompt = "GBIF Password"),
  pred_and(pred("taxonKey", GBIF_backboneID),
           pred("geometry", PR_EEZ)
  ))

The message returned by the above code includes a download key and shows the suggested citation. I can also find this information under the downloads linked to my profile on gbif.org. Create the citation to print below:

GBIF_citation <- rgbif::gbif_citation(x = occ_download_meta(key = '0197864-240321170329656'))[['download']]
…and finally FishBase:

Now let’s get some relevant trait info from Fishbase. FishBase has tons of information in it! So to make it a bit simpler, I’m only going to grab data about mahi-mahi reproduction.

library(rfishbase)

repro_table <- rfishbase::species_list(SpecCode = fishbaseID) %>%
  reproduction() %>%
  select(Species,
         ReproMode,
         Fertilization,
         Spawning, 
         RepGuild1,
         RepGuild2,
         ParentalCare)

Visualize Results

Let’s map the results, showing which data came from OBIS, which data came from OBIS, and displaying it alongside select reproductive traits from FishBase.

library(ggplot2)
library(rnaturalearth)

#Get an outline of Puerto Rico for mapping
PR <- rnaturalearth::ne_countries(country = 'Puerto Rico', 
                                  returnclass = 'sf', 
                                  scale = 'large')

# Select needed columns
obis_select <- obis_results %>% 
               select(occurrenceID, 
                      decimalLatitude, 
                      decimalLongitude) %>% 
  mutate(Source = 'OBIS')

gbif_select <- gbif_results %>% 
               select(occurrenceID, 
                      decimalLatitude, 
                      decimalLongitude) %>% 
  mutate(Source = 'GBIF')

# Join Data from GBIF and OBIS
mahi_joined <- rbind(obis_select,
                         gbif_select)

map_plot <- ggplot(PR) +
  geom_sf() +
  geom_point(data = mahi_joined,
             inherit.aes = FALSE,
             aes(x = decimalLongitude, 
                 y = decimalLatitude,
                 color = Source)) +
  
  #everything below here only serves to stylize the plot
  
  scale_color_manual(values = c('orange', 'skyblue')) +
  theme_bw(base_size=14) +
  theme(plot.title = ggtext::element_markdown(hjust = 0.5)) +
  guides(color = guide_legend(override.aes = list(size = 5))) +
  coord_sf(xlim = c(-64, -70)) +
  
  # This looks really complicated, but it's only style. Just making appropriate italics and line breaks.
  labs(title = "Map of _Coryphaena hippurus_ occurrences <br>in the Puerto Rico EEZ. <br>Data sourced from OBIS and GBIF.")

And finally, we print the plot and the table!

map_plot

repro_table %>% 
  tidyr::pivot_longer(cols = everything(),
                      names_to = "Term", 
                      values_to = "Value") %>% 
  knitr::kable(caption = "Select reproductive traits for <i>Coryphaena hippurus</i> from FishBase.")
Select reproductive traits for Coryphaena hippurus from FishBase.
Term Value
Species Coryphaena hippurus
ReproMode dioecism
Fertilization external
Spawning Variable throughout range
RepGuild1 nonguarders
RepGuild2 open water/substratum egg scatterers
ParentalCare none

Citations

  • Bakış, Y., Wang, X. FishNet2 Marine Data. Published 2023-12-19. https://fishnet.tulane.edu/ipt/resource?r=fishnet2_obis.
  • Benson, A., Diaz, G. NOAA Southeast Fisheries Science Center (SEFSC) Fisheries Log Book System (FLS) Commercial Pelagic Logbook Data. Published 2021-08-25. https://ipt-obis.gbif.us/resource?r=sefsc_logbook.
  • Garrison, L., Garrison, L., OBIS-SEAMAP. SEFSC Caribbean Survey 1995. Published 2021-09-08. http://ipt.env.duke.edu/resource?r=zd_11.
  • OBIS (2024) Occurrence records of Coryphaena hippurus (Linnaeus, 1758) in the Puerto Rico EEZ (Available: Ocean Biodiversity Information System. Intergovernmental Oceanographic Commission of UNESCO. www.obis.org. Accessed:2024-04-26)
  • GBIF Occurrence Download https://doi.org/10.15468/dl.8sjy3r Accessed from R via rgbif (https://github.com/ropensci/rgbif) on 2024-04-19

Session Info

This tells you the version and packages I had loaded as part of this blog creation.

sessionInfo() %>% print(locale = FALSE)
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggtext_0.1.2        stringr_1.5.1       rgbif_3.7.9        
 [4] robis_2.11.3        rfishbase_4.1.2     rnaturalearth_0.3.4
 [7] ggplot2_3.5.0       tidyr_1.3.1         httr2_1.0.1        
[10] worrms_0.4.3        wk_0.8.0            sf_1.0-14          
[13] mregions2_1.0.0     dplyr_1.1.4        

loaded via a namespace (and not attached):
 [1] DBI_1.2.2                rlang_1.1.1              magrittr_2.0.3          
 [4] e1071_1.7-13             compiler_4.3.1           roxygen2_7.3.1          
 [7] vctrs_0.6.5              httpcode_0.3.0           pkgconfig_2.0.3         
[10] crayon_1.5.2             fastmap_1.1.1            backports_1.4.1         
[13] dbplyr_2.3.3             mapedit_0.6.0            ellipsis_0.3.2          
[16] utf8_1.2.4               promises_1.2.0.1         rmarkdown_2.24          
[19] markdown_1.7             tzdb_0.4.0               purrr_1.0.1             
[22] bit_4.0.5                rnaturalearthhires_0.2.1 xfun_0.43               
[25] cachem_1.0.8             jsonlite_1.8.8           progress_1.2.3          
[28] later_1.3.1              parallel_4.3.1           prettyunits_1.2.0       
[31] R6_2.5.1                 stringi_1.7.12           rdflib_0.2.8            
[34] Rcpp_1.0.11              knitr_1.46               triebeard_0.4.1         
[37] readr_2.1.4              httpuv_1.6.11            tidyselect_1.2.1        
[40] rstudioapi_0.15.0        yaml_2.3.8               curl_5.2.0              
[43] lattice_0.21-8           tibble_3.2.1             plyr_1.8.9              
[46] shiny_1.7.4.1            withr_3.0.0              askpass_1.2.0           
[49] evaluate_0.23            units_0.8-3              proxy_0.4-27            
[52] xml2_1.3.5               pillar_1.9.0             whisker_0.4.1           
[55] KernSmooth_2.23-22       checkmate_2.3.1          generics_0.1.3          
[58] vroom_1.6.5              sp_2.0-0                 hms_1.1.3               
[61] commonmark_1.9.1         munsell_0.5.1            scales_1.3.0            
[64] xtable_1.8-4             class_7.3-22             glue_1.6.2              
[67] lazyeval_0.2.2           tools_4.3.1              data.table_1.14.8       
[70] fs_1.6.3                 grid_4.3.1               contentid_0.0.17        
[73] crosstalk_1.2.0          urltools_1.7.3           colorspace_2.1-0        
[76] duckdb_0.8.1-1           cli_3.6.2                rappdirs_0.3.3          
[79] fansi_1.0.6              gtable_0.3.4             oai_0.4.0               
[82] digest_0.6.33            redland_1.0.17-18        classInt_0.4-9          
[85] crul_1.4.0               farver_2.1.1             htmlwidgets_1.6.2       
[88] memoise_2.0.1            htmltools_0.5.5          lifecycle_1.0.4         
[91] leaflet_2.2.0            httr_1.4.7               mime_0.12               
[94] gridtext_0.1.5           openssl_2.1.2            bit64_4.0.5