Full course description
The former National Center for Genome Analysis Support (NCGAS) offers this online workshop on high performance computing (HPC) usage and transcriptome assembly, annotation, and analysis. The workshop consists of discussions, lectures, and hands-on tutorials/activities to cover topics important to getting started constructing and analyzing transcriptomes. While the focus is largely on de novo assembly, genome-guided transcriptome assembly and analysis is discussed, and demo code is provided. Material covers both the availability and use of HPC resources, alongside the task of assembling a new transcriptome, in order to provide a more comprehensive preparation for this and future bioinformatic tasks. The main case study will consist of using four separate assemblers (Trinity, SOAP de novo, Velvet Oases, and TransABySS), with multiple kmers, to be combined and curated with Evigenes. This combined assembly with multiple parameters is considered much more robust than simply using one assembler, and the NCGAS pipeline streamlines the process and allows for customization if desired. Downstream analyses such as differential expression, generating KEGG pathway images, and annotation using Trinotate will also be discussed. While material will make heavy use of XSEDE and IU machines, the material is transferable to any cluster.
Participants should leave with the following knowledge:
- Familiarity with nationally available compute resources
- An understanding of the differences, pros, and cons of VMs, Gateways, Clusters, and Clouds
- How to run and optimize a job submission on a cluster
- How to manage large data sets and move data between resources
- How to run NCGAS’s transcriptome tools to produce robust transcriptomes
- How to check quality and clean up a de novo transcriptome
- Familiarity with some of the considerations in downstream analyses
- How to get help for both genomic and computational questions
Participant data will not be assembled during the workshop, but the entire pipeline will be used by participants with smaller scale demo data.