Note: This tutorial is written to be a basic introduction to assembling RADseq-style data in ipyrad for an in-class demonstration in UNR’s NRES 721, not to be a comprehensive guide to doing this for research purposes.

Students: if you wish to follow along, please install the latest version of ipyrad (follow the conda directions here); I have written this tutorial using v0.9.14. This program is written for Linux or Mac operating systems; if you are running Windows, you can try to run ipyrad through your favorite emulator or virtual machine, or you can just follow along as we move through the tutorial. We’ll return to R for downstream analyses!

Introduction

Reduced-representation genomic sequencing (e.g., RADseq) is a popular group of methods for generating large-scale datasets for population genomic and phylogenomic studies, especially for non-model organisms without reference genomes. The sequencing reads generated from these methods come from thousands to millions of different portions of the genome, and there are numerous methods for “assembling” these data. Typically, this means clustering sequences (either de novo or against a reference genome), aligning sequences, and callling SNPs.

In this tutorial, we’ll practice assembling RADseq-style data. Our objectives are to:

  1. understand the general concepts re: how ipyrad assembles data
  2. compare results from a de novo and reference-based assembly
  3. produce output files for use in downstream analyses

To do this, we’ll use data from the patch-nosed salamander (Urspelerpes brucei). This is a tiny (~25 mm) lungless salamander endemic to just ~ 20 km2 in Georgia and South Carolina. Aren’t they beautiful?