Querying OBIS, GBIF, and Fishbase with AphiaIDs
What?
I recently attended a meeting to discuss Species Distribution Models, or SDMs. The attendees emphasized the importance of GBIF, OBIS, and FishBase for informing SDMs. One of the challenges that was discussed was how to query taxa consistently and efficiently across these databases.
Here I demonstrate how to use AphiaIDs, the identifier used by the World Register of Marine Species (WoRMS), to query and link the data from OBIS, GBIF, and FishBase. All of the code below is written in R. And a big thank you to Yi-Ming Gan, from the SCAR Antarctic Biodiversity Portal (hosted by Institute of Natural Sciences), for helping me connect the dots!
The basic idea:
- Query WoRMS to identify your AphiaID(s).
- Query WoRMS to get the equivalent Fishbase ID(s).
- Query GBIF to get the equivalent identifier(s) from the GBIF taxonomic backbone.
- Query OBIS, GBIF, and Fishbase for your taxa of interest.
Why?
The data needed to answer a question can be distributed across multiple repositories. Often, the first step to answering a question is to coordinate the data. All of the aforementioned databases are working to resolve this challenge, but it will probably be some time before science completely figures out how to seamlessly serve all data all the time!
How?
We’ll look at mahi-mahi, AKA Dorado, (Coryphaena hippurus) in the EEZ of Puerto Rico.
Get your R packages set up
You’ll need the packages listed in the code block below to be installed if you want to complete the analysis yourself. In some cases, I’ve pointed out which package a function comes from by calling it explicitly. For example, worrms::wm_name2id
is calling the wm_name2id
package from the worrms
package. I only did this to make it clear where the function is coming from. You can also load these packages with the library
function, like I show in the code block below.
Note that while most of the packages can be installed with the
install.packages()
function, themregions2
package needs to be installed from Github. See here for instructions.
library(dplyr)
library(mregions2)
library(sf)
library(wk)
library(worrms)
library(httr2)
library(tidyr)
library(ggplot2)
library(rnaturalearth)
library(rfishbase)
library(robis)
library(rgbif)
library(stringr)
library(ggtext)
Step 1. Query WoRMS to identify your AphiaID(s).
You can search WoRMS through your web browser, using scientific or common names. Here is the landing page for Coryphaena hippurus:
https://www.marinespecies.org/aphia.php?p=taxdetails&id=126846
You can see the AphiaID at the end of the link: 126846
. There is also an R package, worrms
that can help execute this query:
<- worrms::wm_name2id(name = "Coryphaena hippurus") AphiaID
Note that we put the AphiaID into an object named
AphiaID
that we will use below.
Step 2. Query WoRMS to get the equivalent Fishbase ID(s).
Here I show two ways to query the WoRMS API for the fishbase identifier. The first is via the httr2
package The second is via the worrms
package. If you’re not familiar with APIs and how they work, John Waller at GBIF wrote a blog post that’s a good place to start.
library(httr2)
<- request(base_url = 'https://www.marinespecies.org/rest/AphiaExternalIDByAphiaID/') %>%
fishbaseID req_url_path_append(AphiaID) %>%
req_url_query(`type` = 'fishbase') %>%
req_perform() %>%
resp_body_json() %>%
unlist()
To do the same thing with the worrms
package:
library(worrms)
<- wm_external(id = AphiaID, type = "fishbase") fishbaseID
Step 3. Query GBIF to get the equivalent identifier(s) from GBIF
GBIF aligns the data to a custom taxonomic database known as the GBIF taxonomic backbone. Although most AphiaIDs map to the backbone, I’m not 100% positive that all of them do. This dissonance is a known problem in the world of taxonomic databases and smart people from all over the world are working on removing this challenge.
If you only have one species, or aren’t feeling very code savvy, you can also view this information by putting the API call into a web browser:
But if you are feeling code savvy:
library(httr2)
<- paste0('urn:lsid:marinespecies.org:taxname:', AphiaID)
sourceId
<- request(base_url = 'https://api.gbif.org/v1/species') %>%
response req_url_query(`datasetKey` = '2d59e5db-57ad-41ff-97d6-11f5fb264527',
`sourceId` = 'urn:lsid:marinespecies.org:taxname:159222') %>%
req_perform() %>%
resp_body_json()
<- response$results[[1]]$nubKey GBIF_backboneID
Step 4. Query OBIS, GBIF, and FishBase for your taxa of interest.
OBIS
Geometry
Before we can execute our query, we need to create the geometry for the area we wish to search. This is important because without a geographic shape (AKA geometry) we’d be searching for Mahi anywhere on the planet!
We’ll grab an outline shape of the Puerto Rico EEZ from MarineRegions.org using the package mregions2
and turn it into a convex hull to simplify the geometry being queried. Then we’ll turn it into a text string in a standard format, Well-Known Text (WKT), so it can be used in the API queries.
#Make convex hull of PR EEZ from MarineRegions.org.
<- mregions2::gaz_geometry(x = 33179) %>%
PR_EEZ ::st_convex_hull() %>%
sf::st_as_text() %>%
sf::wkt() %>%
wk::wk_orient() wk
Note: The GBIF API reads
clockwise
WKT as a hole in geometry. But thesf
package outputs clockwise WKT by default. The functions from thewk
package are the easiest way I’ve found to reverse the winding order the WKT.
Let’s start with OBIS, since it aligns to WoRMS and therefore uses AphiaIDs.
library(robis)
<- robis::occurrence(taxonid = AphiaID,
obis_results geometry = PR_EEZ)
Let’s take a moment to create the citations for the four datasets that contributed to this query, and for OBIS. We will publish these citatoins at the bottom of this post. We’ll follow the guidance in the OBIS Manual:
<- obis_results$dataset_id %>% unique() %>%
OBIS_metadata ::dataset(datasetid = .)
robis
#generate citations
<- list()
OBIS_citations
for(i in 1:nrow(OBIS_metadata)){
<- robis::generate_citation(title = OBIS_metadata[i,]$title,
OBIS_citations[[i]] published = OBIS_metadata[i,]$published,
url = OBIS_metadata[i,]$url,
contacts = OBIS_metadata[i,]$contacts %>% as.data.frame())
%>% unlist()
}
# Make citation for OBIS itself:
<- Sys.Date()
date_accessed <- "Occurrence records of Coryphaena hippurus (Linnaeus, 1758) in the Puerto Rico EEZ "
query_title
<- paste0("OBIS (2024) ",
OBIS_citation
query_title, '(Available: Ocean Biodiversity Information System. Intergovernmental Oceanographic Commission of UNESCO. www.obis.org. Accessed:',
date_accessed, ')'
)
Then GBIF…
GBIF data can be accessed through the API without obtaining a DOI, but the best practice is to mint a DOI once you are convinced that your query is final. If you continue to filter your results after downloading, you can always mint a derived dataset DOI.
library(rgbif)
# download without DOI, to explore the data.
<- rgbif::occ_data(taxonKey = GBIF_backboneID,
gbif_results geometry = PR_EEZ) %>%
"data"]] .[[
#download with DOI, so I can cite the data.
::occ_download(
rgbifuser = 'sformel',
email = 'sformel@usgs.gov',
pwd = rstudioapi::askForPassword(prompt = "GBIF Password"),
pred_and(pred("taxonKey", GBIF_backboneID),
pred("geometry", PR_EEZ)
))
The message returned by the above code includes a download key and shows the suggested citation. I can also find this information under the downloads linked to my profile on gbif.org. Create the citation to print below:
<- rgbif::gbif_citation(x = occ_download_meta(key = '0197864-240321170329656'))[['download']] GBIF_citation
…and finally FishBase:
Now let’s get some relevant trait info from Fishbase. FishBase has tons of information in it! So to make it a bit simpler, I’m only going to grab data about mahi-mahi reproduction.
library(rfishbase)
<- rfishbase::species_list(SpecCode = fishbaseID) %>%
repro_table reproduction() %>%
select(Species,
ReproMode,
Fertilization,
Spawning,
RepGuild1,
RepGuild2, ParentalCare)
Visualize Results
Let’s map the results, showing which data came from OBIS, which data came from OBIS, and displaying it alongside select reproductive traits from FishBase.
library(ggplot2)
library(rnaturalearth)
#Get an outline of Puerto Rico for mapping
<- rnaturalearth::ne_countries(country = 'Puerto Rico',
PR returnclass = 'sf',
scale = 'large')
# Select needed columns
<- obis_results %>%
obis_select select(occurrenceID,
decimalLatitude, %>%
decimalLongitude) mutate(Source = 'OBIS')
<- gbif_results %>%
gbif_select select(occurrenceID,
decimalLatitude, %>%
decimalLongitude) mutate(Source = 'GBIF')
# Join Data from GBIF and OBIS
<- rbind(obis_select,
mahi_joined
gbif_select)
<- ggplot(PR) +
map_plot geom_sf() +
geom_point(data = mahi_joined,
inherit.aes = FALSE,
aes(x = decimalLongitude,
y = decimalLatitude,
color = Source)) +
#everything below here only serves to stylize the plot
scale_color_manual(values = c('orange', 'skyblue')) +
theme_bw(base_size=14) +
theme(plot.title = ggtext::element_markdown(hjust = 0.5)) +
guides(color = guide_legend(override.aes = list(size = 5))) +
coord_sf(xlim = c(-64, -70)) +
# This looks really complicated, but it's only style. Just making appropriate italics and line breaks.
labs(title = "Map of _Coryphaena hippurus_ occurrences <br>in the Puerto Rico EEZ. <br>Data sourced from OBIS and GBIF.")
And finally, we print the plot and the table!
map_plot
%>%
repro_table ::pivot_longer(cols = everything(),
tidyrnames_to = "Term",
values_to = "Value") %>%
::kable(caption = "Select reproductive traits for <i>Coryphaena hippurus</i> from FishBase.") knitr
Term | Value |
---|---|
Species | Coryphaena hippurus |
ReproMode | dioecism |
Fertilization | external |
Spawning | Variable throughout range |
RepGuild1 | nonguarders |
RepGuild2 | open water/substratum egg scatterers |
ParentalCare | none |
Citations
- Bakış, Y., Wang, X. FishNet2 Marine Data. Published 2023-12-19. https://fishnet.tulane.edu/ipt/resource?r=fishnet2_obis.
- Benson, A., Diaz, G. NOAA Southeast Fisheries Science Center (SEFSC) Fisheries Log Book System (FLS) Commercial Pelagic Logbook Data. Published 2021-08-25. https://ipt-obis.gbif.us/resource?r=sefsc_logbook.
- Garrison, L., Garrison, L., OBIS-SEAMAP. SEFSC Caribbean Survey 1995. Published 2021-09-08. http://ipt.env.duke.edu/resource?r=zd_11.
- OBIS (2024) Occurrence records of Coryphaena hippurus (Linnaeus, 1758) in the Puerto Rico EEZ (Available: Ocean Biodiversity Information System. Intergovernmental Oceanographic Commission of UNESCO. www.obis.org. Accessed:2024-04-26)
- GBIF Occurrence Download https://doi.org/10.15468/dl.8sjy3r Accessed from R via rgbif (https://github.com/ropensci/rgbif) on 2024-04-19
Session Info
This tells you the version and packages I had loaded as part of this blog creation.
sessionInfo() %>% print(locale = FALSE)
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggtext_0.1.2 stringr_1.5.1 rgbif_3.7.9
[4] robis_2.11.3 rfishbase_4.1.2 rnaturalearth_0.3.4
[7] ggplot2_3.5.0 tidyr_1.3.1 httr2_1.0.1
[10] worrms_0.4.3 wk_0.8.0 sf_1.0-14
[13] mregions2_1.0.0 dplyr_1.1.4
loaded via a namespace (and not attached):
[1] DBI_1.2.2 rlang_1.1.1 magrittr_2.0.3
[4] e1071_1.7-13 compiler_4.3.1 roxygen2_7.3.1
[7] vctrs_0.6.5 httpcode_0.3.0 pkgconfig_2.0.3
[10] crayon_1.5.2 fastmap_1.1.1 backports_1.4.1
[13] dbplyr_2.3.3 mapedit_0.6.0 ellipsis_0.3.2
[16] utf8_1.2.4 promises_1.2.0.1 rmarkdown_2.24
[19] markdown_1.7 tzdb_0.4.0 purrr_1.0.1
[22] bit_4.0.5 rnaturalearthhires_0.2.1 xfun_0.43
[25] cachem_1.0.8 jsonlite_1.8.8 progress_1.2.3
[28] later_1.3.1 parallel_4.3.1 prettyunits_1.2.0
[31] R6_2.5.1 stringi_1.7.12 rdflib_0.2.8
[34] Rcpp_1.0.11 knitr_1.46 triebeard_0.4.1
[37] readr_2.1.4 httpuv_1.6.11 tidyselect_1.2.1
[40] rstudioapi_0.15.0 yaml_2.3.8 curl_5.2.0
[43] lattice_0.21-8 tibble_3.2.1 plyr_1.8.9
[46] shiny_1.7.4.1 withr_3.0.0 askpass_1.2.0
[49] evaluate_0.23 units_0.8-3 proxy_0.4-27
[52] xml2_1.3.5 pillar_1.9.0 whisker_0.4.1
[55] KernSmooth_2.23-22 checkmate_2.3.1 generics_0.1.3
[58] vroom_1.6.5 sp_2.0-0 hms_1.1.3
[61] commonmark_1.9.1 munsell_0.5.1 scales_1.3.0
[64] xtable_1.8-4 class_7.3-22 glue_1.6.2
[67] lazyeval_0.2.2 tools_4.3.1 data.table_1.14.8
[70] fs_1.6.3 grid_4.3.1 contentid_0.0.17
[73] crosstalk_1.2.0 urltools_1.7.3 colorspace_2.1-0
[76] duckdb_0.8.1-1 cli_3.6.2 rappdirs_0.3.3
[79] fansi_1.0.6 gtable_0.3.4 oai_0.4.0
[82] digest_0.6.33 redland_1.0.17-18 classInt_0.4-9
[85] crul_1.4.0 farver_2.1.1 htmlwidgets_1.6.2
[88] memoise_2.0.1 htmltools_0.5.5 lifecycle_1.0.4
[91] leaflet_2.2.0 httr_1.4.7 mime_0.12
[94] gridtext_0.1.5 openssl_2.1.2 bit64_4.0.5