Gotcloud

Genomes on the Cloud, Mapping & Variant Calling Pipelines

This project is maintained by statgen

Welcome to GotCloud.

Genomes on the Cloud (GotCloud) is our set of sequence analysis software pipelines that we developed to handle the increasing volume of next generation sequencing and genotyping data.

Also included within GotCloud

See http://gotcloud.org for full GotCloud documentation.

Obtain GotCloud

First download GotCloud either using git or by using one of the download buttons to the left.

See GotCloud: Source Releases for more information

Build GotCloud

Before running GotCloud, you need to compile the source:

$ cd your_downloaded_gotcloud_dir/src; make; cd ..

Test GotCloud

To test GotCloud align, run:

$ ./gotcloud align --test ~/testalign

This will create/clear the output directory ~/testalign (alternatively, you can choose a different directory).

Test results and a log file are put into that directory.

Results are self-checked, and if any errors should occur, GotCloud will print an error message.

To test GotCloud snpcall, run:

$ ./gotcloud snpcall --test ~/testsnp

This will create/clear the output directory ~/testsnp (alternatively, you can choose a different directory).

Test results and a log file are put into that directory.

Results are self-checked, and if any errors should occur, GotCloud will print an error message.

To test GotCloud ldrefine, run:

$ ./gotcloud ldrefine --test ~/testsnp

The ldrefine test can only be run after the snpcall test, and must use the same directory used for the snpcall test.

Test results and a log file are put into that directory.

Results are self-checked, and if any errors should occur, GotCloud will print an error message.

To test GotCloud indel, run:

$ ./gotcloud indel --test ~/testindel

This will create/clear the output directory ~/testindel (alternatively, you can choose a different directory).

Test results and a log file are put into that directory.

Results are self-checked, and if any errors should occur, GotCloud will print an error message.

General Help for Variant Calling

Variant Calling requires three types of input files:

  1. a set of BAM files
    • For high quality SNP calls BAM files should already be:
      • duplicate-marked
      • base-quality recalibrated
  2. BAM list file
    • Each line contains at least 2-3 space-separated columns representing a single individual:
        [SAMPLE_ID] [COMMA SEPARATED POPULATION LABELS] [BAM_FILE1] [BAM_FILE2] ...
    
    • 1 or more BAMs are allowed per individual
    • Population column is optional and can be skipped
  3. configuration file
    • Contains run-time options & command line arguments
    • A default configuration is provided
    • User must specify (KEY = VALUE pairs):
      • BAM_LIST = # the path/name of the bam list file
      • OUT_DIR = # output directory
    • Optional specifications...
      • CHRS = # space separated list of chromosomes - defaults to 1-22 & X
    • Refer to the default configuration & the wiki page for more information on other settings.

Optional input files:

Once these files are configured, to run snp calling and process the data:

   {path}gotcloud snpcall --conf {conf_file} \
      --outdir {output_directory} --numjobs {# of threads to use for processing}

Authors and Contributors

GotCloud has been developed and maintained by within the Center of Statistical Genetics at the University of Michigan. The following people have worked on and contributed to GotCloud:

The Group's github account is @statgen.

Support or Contact

Having trouble with GotCloud?