Populating the GeOMe FIMS spreadsheet¶
The GeOMe FIMS spreadsheet template¶
The SI Barcode Network GeOMe FIMS spreadsheet template is hosted on the SI Barcode Network GitHub page at https://github.com/SIBarcodeNetwork/SIBarcodeNetwork, called “SI Barcoding Specimen Spreadsheet.xlsx”. (The direct link to download it is here.) If you would like to personalize the columns included in your template, you will need to create a new account on GeOMe and then you will have the option to generate a customized template by following this link.
It is a good idea to immediately rename this spreadsheet file with the name of your dataset according to the FIMS naming conventions.
There are four tabs in the spreadsheet: Instructions, Samples, Samples_Fields, and Lists. The Instructions tab explains how to use and populate the spreadsheet. The Samples tab is where your data will be entered. The Samples_Fields tab includes definitions for all the columns on the Samples tab. The Lists tab includes all controlled vocabulary for certain fields.
Source of columns¶
The GeOMe FIMS, which we will be using to store and interface with specimen data, allows projects to completely customize the fields that they use – along with the validation rules that accompany those fields. Since the goal of the SI Barcode Network is to get high quality sequences onto GenBank, we limited the specimen fields to those that will end up in a GenBank record. However, we also based our fields on DarwinCore fields, which you should notice in the field names. Our subset of fields acts as a bridge between permanent collection databases (like NMNH EMu) and GenBank.
Red column headers indicate fields required before they can be uploaded to the FIMS.
Black column headers are not required, but they should still be filled out if the values are known.
- The collector’s specimen number. This number must be unique among the IDs within the expedition. You can use the field number or voucherCatalogNumber, if no field number exists.
- This is the acronym for the institution or repository where the specimen voucher is stored. The GBIF Registry of Scientific Collections is a registry for all institution codes, and the institutionCode field will be validated against its list of codes. Vouchers accessioned at the Smithsonian will usually have the institutionCode “USNM”.
- The name, acronym, coden, or initialism identifying the collection or data set from which the record was derived. This collectionCode should be registered, along with the institutionCode, in The GBIF Registry of Scientific Collections.
- An identifier (preferably unique) for the record within the data set or collection.
- This is the unique id given to the specimen the tissue was taken from. It is constructed from the institutionCode, collectionCode (if one exists), and catalogNumber, joined by a colon (:).
- The full scientific name of the kingdom in which the taxon is classified.
- The full scientific name of the phylum in which the taxon is classified. The list of phyla allowable in GeOMe are taken from the Catalog of Life. In addition, we have added ‘Unknown’ as an acceptable value. See controlled vocabulary in the Lists tab of the spreadsheet.
- The full scientific name of the class in which the taxon is classified.
- The full scientific name of the order in which the taxon is classified.
- The full scientific name of the family in which the taxon is classified.
- The full scientific name of the genus in which the taxon is classified.
- The full scientific name of the specificEpithet in which the taxon is classified.
- The full scientific name of the specimen. When forming part of an Identification, this should be the name in lowest level taxonomic rank that can be determined.
- A list (concatenated and separated by the pipe (‘|’) symbol) of names of people, groups, or organizations who assigned the Taxon to the subject.
- A list (concatenated and separated by the pipe (‘|’) symbol) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first.
- The four-digit year in which the voucher was collected, according to the Common Era Calendar. (If you are unsure of the value and will never come across it, add ‘Unknown’, or if you do not currently have the data but will in the future, add ‘TBD’.)
- The two-digit numerical month in which the voucher was collected. This will be validated to being in the range from 1 to 12.
- The integer day of the month on which the voucher was collected. This will be validated to being in the range from 1 to 31.
- The name of the country or major administrative unit in which the Location occurs. This field will be validated against the INSDC country list (http://www.insdc.org/country.html). See controlled vocabulary in the Lists tab of the spreadsheet.
- The specific description of the collection location. Less specific geographic information can be provided in other geographic terms (higherGeography, continent, country, stateProvince, county, municipality, waterBody, island, islandGroup). This term may contain information modified from the original to correct perceived errors or standardize the description. (If you are unsure of the value and will never come across it, add ‘Unknown’, or if you do not currently have the data but will in the future, add ‘TBD’.) This will be combined with the countryOrOcean field in the GenBank record.
- The geographic latitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic center of a Location. Positive values are north of the Equator, negative values are south of it. Legal values lie between -90 and 90, inclusive.
- The geographic longitude (in decimal degrees, using the spatial reference system given in geodeticDatum) of the geographic center of a Location. Positive values are east of the Greenwich Meridian, negative values are west of it. Legal values lie between -180 and 180, inclusive.
- A list (concatenated and separated) of the tissue types sampled from this individual, together with any tissue identifiers that were assigned to them
- The name of the plate (typically a 96 well plate) containing the tissue subsamples that will be consumed for DNA extractions for projects.
- The well location in the tissue plate – formatted as follows: A01, A02, etc.
- This is the unique identifier for the tissue sample from which the DNA was extracted. This identifier must be unique across all projects. The materialSampleID can be used. If there are multiples of a tissue sample in different wells, please use the following format: materialSampleID + “.#”, where “#” is the number corresponding to the multiple (e.g. “.1” for the first occurrence, “.2” for the second occurrence).
- This is the 2D barcode of the storage tube which contains the DNA extract of the specimen. This field will not be populated until after the DNA extraction process is complete.
- BOLD Process IDs are unique codes automatically generated for each new record added to a project within the Barcode of Life Database.