Populating the FIMS spreadsheet¶
The FIMS spreadsheet template¶
The SI Barcode Network FIMS spreadsheet template is hosted on the SI Barcode Network GitHub page at https://github.com/SIBarcodeNetwork/SIBarcodeNetwork, called “SI Barcoding Specimen Spreadsheet.xlsx”. You can also download the spreadsheet directly from here.
It is a good idea to immediately rename this spreadsheet file with the name of your dataset according to the FIMS naming conventions.
Source of columns¶
The BiSciCol FIMS, which we will be using to store and interface with specimen data, allows projects to completely customize the fields that they use – along with the validation rules that accompany those fields. Since the goal of the SI Barcode Network is to get BARCODE keyword sequences into GenBank, we limited the specimen fields to those that will end up in a GenBank record. However, we also based our fields on DarwinCore fields, which you should notice in the field names. We think that our collections of fields act as a bridge between permanent collection databases (like NMNH EMu) and GenBank.
Blue column headers are all identifiers of either the specimen voucher, the tissue sample of the voucher, or the DNA extraction from that tissue. Green column headers are all specimen metadata about the collection event or taxonomic identification of the specimen. Finally, red fields will be populated by CBOL staff after the spreadsheet has been completed.
Dark-colored column headers indicate fields required before they can be uploaded to the FIMS. Light-colored column headers are not required, but they should still be filled out if the values are known.
- Name of the DNA extraction plate. This will be the same as the FIMS Dataset Code.
- This is the 2D barcode of the storage tube which contains the DNA extract of the specimen. This field will not be populated until after the DNA extraction process is complete.
- This is the plate position of the DNA extraction.
- This is the unique identifier for the tissue sample from which the DNA extraction was taken. This identifier must be unique across all projects. If the tissue is stored in the NMNH Biorepository, please use the Biorepository ID. If there are multiples of a tissue sample in different wells, add a letter to distinguish them.
- The type of tissue that the DNA was extracted from.
- This is the unique id given to the specimen the tissue was taken from. It is constructed from the institutionCode, collectionCode (if one exists), and catalogNumber, joined by a colon (:).
- This is the acronym for the institution or repository where the specimen voucher is stored. GRBio.org is a registry for all institution codes, and the institutionCode field will be validated against its list of codes. Vouchers accessioned at the Smithsonian will usually have the institutionCode “USNM”.
- The name or acronym identifying the collection or data set (from the institution/repository listed in institutionCode) from which the record was derived. This collectionCode should be registered, along with the institutionCode, in GRBio.org.
- An identifier (preferably unique) for the record within the data set or collection.
- The full scientific name of the specimen. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined.
- The country or water body from which the specimen voucher was collected. This field will be validated against the INSDC country list (http://www.insdc.org/country.html).
- The full geographic description (within the Country or Ocean) of where the specimen was collected. This will be combined with the countryOrOcean field in the GenBank record.
- The geographic latitude,in decimal degrees of the geographic center of a Location. Positive values are north of the Equator, negative values are south of it. This field will be validated to being within the range -90.0 to 90.0.
- The geographic longitude,in decimal degrees of the geographic center of a Location. Positive values are east of the Greenwich Meridian, negative values are west of it. This field will be validated to being within the range -180.0 to 180.0.
- The four-digit year in which the voucher was collected, according to the Common Era Calendar.
- The two-digit numerical month in which the voucher was collected. This will be validated to being in the range from 1 to 12.
- The integer day of the month on which the voucher was collected. This will be validated to being in the range from 1 to 31.
- A list (concatenated and separated by semicolons) of names of people, groups, or organizations responsible for collecting the specimen voucher. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first. The name format should preferably be Given Name, [space], Last Name.
- A list (concatenated by semicolons) of names of people who assigned the Taxon to the specimen voucher. The name format should preferably be Given Name, [space], Last Name.
- The full scientific name of the kingdom in which the specimen voucher is classified.
- The full scientific name of the phylum in which the specimen voucher is classified.
- The full scientific name of the class in which the specimen voucher is classified.
- The full scientific name of the order in which the specimen voucher is classified.
- The full scientific name of the family in which the specimen voucher is classified.
- The full scientific name of the genus in which the taxon is classified.
- The name of the first or species epithet of the scientificName of the specimen voucher.
- The name of the lowest or terminal infraspecific epithet of the scientificName, excluding any rank designation.
- This is the combined lat_lon field for submission to GenBank. GenBank uses the specific format “d[d.dddd] N|S d[dd.dddd] W|E”. An example of this is “38.891262 N 77.026093 W” for the Smithsonian Natural History Museum.
- We use this field to combine the CountryOrOcean and Locality fields together, since it is a single field in GenBank. Typically, locality terms following the standardized country name are ordered in ascending order of specificity. An example for a specimen collected on the grounds of the Smithsonian Natural History Museum might be “USA: Washington, DC; Smithsonian Natural History Museum; West Loading Dock”.
We use this field to combine the YearCollected, MonthCollected, and DayCollected fields together, since it is a single field in GenBank. Here are the supported value formats, with examples:
- “DD-Mmm-YYYY”: 01-Jan-2016
- “Mmm-YYYY”: Jan-2016
- “YYYY”: 2016
- “YYYY-MM-DD”: 2016-01-01
- “YYYY-MM”: 2016-01