A database of freshwater fish species of the Amazon Basin | Scientific Data

Information sources

The database results from the transnational collaborative project AmazonFish (ERANetLAC/DCC-0210) whose purpose was to identify and compile all known information sources available on freshwater fish species occurrences for the entire Amazon drainage basin. The original project included researchers from (1) the French Institute for Development (IRD) in France, (2) the Pontificia Universidad Javeriana (PUJ-UNESIS) in Colombia, (3) the Museo de Historia Natural de la Universidad Nacional Mayor de San Marcos (MUSM) in Peru and (4) the Royal Belgian Institute of Natural Sciences in Belgium. The project also benefited from official collaborations with researchers from Brazil (Instituto Nacional de Pesquisas da Amazônia INPA; Universidade Federal de São Paulo UNIFESP; Universidade Federal de Rondônia UNIR; Universidade Federal do Pará UFPA; Universidade Federal do Oeste do Pará UFOPA; Universidade Federal de Mato Grosso UFMT), Colombia (Instituto Alexander von Humboldt IAvH, Universidad Nacional de Colombia UN ICN-MHN, Universidad del Tolima UT-CZUT, Instituto Amazónico de Investigaciones Científicas SINCHI-CIACOL, Instituto para la Investigación y la Preservación del Patrimonio Cultural y Natural del Valle del Cauca INCIVA, Universidad Católica de Oriente UCO), Ecuador (Museo Ecuatoriano de Ciencias Naturales MECN-DP, Instituto Nacional De Biodiversidad INABIO), Bolivia (Universidad Mayor de San Simon UMSS-ULRA, Colección Boliviana de Fauna MNHN–IE UMSA, Universidad Autónoma del Beni CIRA) and Switzerland (Museum d’Histoire Naturelle de Genève, MHNG). All these partners brought into the project, besides their Neotropical fish taxonomic expertise needed to produce a high-quality database, existing fish databases from their own collections and expeditions, and a large networking capacity that was essential for identifying and involving other data providers.

In order to build the AmazonFish database, an inventory of the possible data sources was conducted at the beginning of the project in early 2016 and data from a wide range of sources were compiled and standardized in a single dataset.

The information used includes five source types:

  1. A.

    Information extracted from the literature (published articles, books, grey literature)

  2. B.

    Data from online biodiversity databases (i.e. GBIF and others)

  3. C.

    Data from museums and universities collections

  4. D.

    Data held or compiled by the project partners (e.g. country level)

  5. E.

    New data from sampling campaigns organized within the framework of the project

An inventory of all the literature sources (published articles, books, technical reports) existent for the Amazon Basin led to more than 800 different documents that were subsequently analysed, from which 459 provided valuable data on fish species distribution, not redundant with any official collection. An important amount of data was extracted from the most used and frequently updated online biodiversity databases (see details in Table 1). These repositories release biological data under a Creative Commons licence in which the user agrees to acknowledge the data sources. Data from museums and universities collections not available through these online facilities were obtained by contacting the curators or researchers in charge and integrating them as official project collaborators (curators and researchers mainly from Brazil, Ecuador and Bolivia). The project partners (Colombia and Peru) compiled data at the country level. For Colombia23,24, the data were previously published through the GBIF network. For Peru, the AmazonFish project has supported the numeric digitalization of the national freshwater fish collections25,26, which is still an ongoing work (51% of the records have been digitalized so far). Finally, supplementary occurrence data were obtained during five sampling campaigns in Brazil, Colombia and Peru and targeting under-sampled areas identified during the project.

Table 1 Online Biodiversity Repository Sources with the complete name, the number of occurrences and institutions, the last consulted date and the internet link.

Full size table

Species, taxonomy and status

All occurrences not identified to species level were discarded (i.e. occurrences giving only genus names commonly abbreviated to sp., species affinis commonly abbreviated to: sp. aff., aff., or affin. or species confer abbreviated to cf.). All species scientific names are reported in the database as appearing in each information source and were carefully checked for typing errors and misspellings. Because taxonomy is a ‘moving target’, species names were standardized and linked to an internationally accepted standardized name and associated taxonomic information in order to find synonymies and provide accepted names. All species names were first searched in FishBase through the ‘rfishbase’ package27 from the R environment28 allowing to easily obtain the valid species names. For species names absent from FishBase, a manual search was applied in the Eschmeyer’s Catalog of Fishes (http://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp). This last step allowed finding valid names and recently described species not yet included in FishBase. The final standardized species list contains 3,366 valid species names avoiding biases due to synonyms and uncertain identifications (see ‘Technical Validation’). We also integrated all remaining species names, i.e. not listed in any of the two scientific catalogues, as ‘unknown name at present’ (294 species names).

A species status (‘native’ or ‘exotic’) and an occurrence species status (‘valid’, ‘to be verified’ or ‘marine’) were assigned to each species. The species status distinguishes ‘native’ from ‘exotic’ species (i.e. non-native species introduced in the Amazon Basin)5 and the occurrence species status is divided in three criteria: (1) ‘valid’ (species known to belong to the Amazon Basin); (2) ‘to be verified’ (species whose presence in the Amazon Basin is not certain because of possible mis-identification or localisation errors); and (3) ‘marine’ (species whose primary habitat is not freshwater, based on information available in FishBase or Eschmeyer’s Catalog of Fishes).

At this time, the database contains 2,406 ‘native’ and ‘valid’ freshwater fish species, 837 ‘to be verified’ species, 105 ‘marine’, 18 ‘exotic’ and 294 ‘unknown’ species. The species considered as ‘native’ and ‘valid’, i.e. freshwater species belonging to the Amazon Basin, were the only species considered in all species numbers reported below.

Sub-drainages delineation

The Amazon Basin was defined here as the area of land where precipitation collects and drains off into a common outlet. This excludes de facto the Tocantins basin and Guiana coastal streams (see Fig. 1), but constitutes for freshwater fishes an ideal grain for conducting biogeographical and/or macroecological studies29.

Fig. 1figure 1

(a) Distribution of sampling sites recorded in the AmazonFish database and (b) delimitation and codes of the 144 sub-drainages units (see corresponding names in Online-only Table 1), based on a modified version of HydroBASINS (see methods). The major tributaries of the Amazon Basin are represented in different colours and their names are added in bold.

Full size image

The hydrological sub-drainage units within the Amazon Basin were delineated using the HydroBASINS framework, a subset of the HydroSHEDS database30. The levels 5 and 6 were combined with a constraint area of >20,000 km2, at the exception of sub-drainages located in the river mainstem where delineation was based on the distance between two main tributaries entering the mainstem. This led to obtain a total of 144 sub-drainages covering the entire Amazon system (Fig. 1).