Organizing GenBank Records with BioProjects¶
What is a BioProject?¶
From the NCBI BioProject homepage (https://www.ncbi.nlm.nih.gov/bioproject/):
“A BioProject is a collection of biological data related to a single initiative, originating from a single organization or from a consortium. A BioProject record provides users a single place to find links to the diverse data types generated for that project.”
BioProjects grew out of the NCBI Genome Project database, which served solely to organize genome sequences in GenBank. However, it became apparent that this organizational structure could be used to group together entries for several more kinds of data, so BioProjects became a distinct entity in 2011.
BioProjects can have a hierarchical structure, meaning that project-level BioProjects can be organized together under an “umbrella” BioProject.
The Smithsonian Barcoding Network (SIBN) and Global Genome Initiative (GGI) are using BioProjects to organize sequence records on GenBank that were generated by each project that it funded. Each funded project will have its own BioProject that makes searching easier and tracking progress more convenient. The SIBN BioProject can be found at https://www.ncbi.nlm.nih.gov/bioproject/81359, and the GGI BioProject can be found at https://www.ncbi.nlm.nih.gov/bioproject/384793.
When a GenBank record is added to a BioProject, a link to other records in the same BioProject appears directly on the GenBank record.
Creating a BioProject¶
You will need to create a BioProject before new GenBank submissions or existing GenBank records can be organized under one.
To create a BioProject, navigate to the new NCBI Submission Portal: https://submit.ncbi.nlm.nih.gov/.
- Click on “Log in” before you get started. Generally, it is easiest to sign in with your Google account, so that you do not have to create yet another username and password to forget.
- After signing in, you should be directed back to the Submission Portal page. In the search box, type “bioproject” and you should see the suggestion “BioProject and BioSample”. Click on this link to get to the bioproject submission tool.
- Feel free to click through the headings on the left of the What You Should Expect section to learn more about the requirements and the submission process. When you’re ready to continue, click the Submit button.
- Click the New Submission button.
- Fill out the Submitter page.
- Select “Targeted Locus (Loci)” for Project Data Type, and “Multispecies” for Sample scope.
- Give a short description for “Multispecies description”.
- The submission portal will create an automated Project Title based on your previous entries, but overwrite this with the title of your project. Give a good description of the project in “Public description”, because this will be front-and-center on the BioProject page. Finally, check the “Yes” box to indicate that this project is part of a larger initiative.
- If this BioProject falls under the SI Barcode Network, then enter “SI Barcode Network” for Initiative description, and “PRJNA81359” for BioProject Accession.
- If this BioProject falls under GGI, then enter “Global Genome Initiative” for Initiative description, and “PRJNA384793” for BioProject Accession.
Leave the rest of entries on this page blank.
- Enter any links you would like to display as part of your project. Add the Consortium and/or Data provider, if needed.
- If you would like to enter any grants, click the Add grants link to enter the relevant information.
- Skip the BioSample page.
- Add any Publications your project has generated on the Publications page. Don’t worry, you can come back and add publications later.
- Finally, the Overview tab will show all of your entries in one place. This will be your last chance to make any changes before submitting.
- After a few days, you will receive an email from NCBI informing you that your BioProject has been successfully created. Most importantly, they will send your BioProject ID, which you can now add to existing GenBank records or include in new GenBank submissions.
How to update BioProject information¶
If your BioProject has already been published and you would like to update any of the entries from the BioProject creation process, email the changes you would like to make to firstname.lastname@example.org.
Adding a BioProject to existing GenBank records¶
Adding a BioProject ID to sequence records that are already published to GenBank is a manual procedure done through email. Email email@example.com, and let them know:
- your BioProject ID, and
- the range of GenBank accessions to which you would like to add the BioProject ID.
Adding a BioProject to new GenBank submissions¶
Out of the several different methods of publishing sequences to GenBank (Genbank Submission Portal, BankIt, Sequin, tbl2asn, Geneious, and BOLD), only the Genbank Submission Portal and tbl2asn have methods for adding a BioProject ID to a batch submission.
If submitting C01 or rDNA through the Genbank Submission Portal, when creating the source modifier table for upload to the portal, simply add a column containing the PRNJ bioproject number with the column header “Bioproject” (without the quotations).
If submitting through tbl2asn, follow instructions in the section below for bioproject addition.
We are working with the Geneious developers to have BioProjects added to the Submission Details section of the Geneious Prime GenBank Upload Plugin. Currently, if submitting barcodes through the Geneious Prime Genbank Upload Plugin, we recommend submitting the sequences first and treating the sequences as “existing Genbank records” (see above).
In the tbl2asn instruction manual at https://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/, the 3 files required to create a submission package are a “template file”, a FASTA file containing nucleotide sequences, and a feature table with annotations. The template file will be where we include the BioProject ID for a submission.
To create a GenBank submission template file, go to https://submit.ncbi.nlm.nih.gov/genbank/template/submission/, and fill out the form. The last section of the form is for “BioProject/BioSample Information”, and this is where you will add your BioProject ID.
Press the “Create Template” button to download a “.sbt” file, and bundle that with your other components for the tbl2asn command line utility.