Htsflow by arnaudceol

Primary analyses can be submitted on a sample or on a group of samples. A primary analysis consists of filtering, quality control and alignment to reference genome of the reads from a specific sample. The sequencing technologies currently accepted by HTS-flow are

RNA-Seq,
ChIP-Seq,
DNaseI-Seq,
BS-Seq.

Several options are applicable to primary analyses:

Remove Bad Reads: removal of reads that has been labelled bad from the sequencer (grep -A 3 '^@.* [^:]*:N:[^:]*:' | grep -v -- '^--$' | sed 's/ [0-9]:N:[0-9]*:[A-Z]*$//g').
Trimming: trim the reads starting at 5' if nucleotide quality Q is below 20. Phred quality scores Q are defined as a property which is logarithmically related to the base-calling error probabilities P. To be used if exists the possibility of high degradation of quality at 5' ends of reads.
Masking: mask nucleotides along the whole reads with N if their Q quality score is below 20.
Program: tophat/bwa/bismarck - tophat is used for aligning RNA-Seq reads, bwa for ChIP-Seq and DNaseI-Seq, bismarck for BS-Seq.
Alignment Options: this line is intended for changing the options provided to the aligners.
Paired: for both tophat and bwa the set of options that treats paired-end reads.
Remove Duplicates: the final alignment file is processed for removing PCR duplicates ( reads that align on the same genomic location ).

Merging primary analysis

It is possible to select a group of aligned samples and pool their reads to obtain a merged alignment file. To be merged, the samples need to be aligned to the same reference genome. This function is available from the primary analysis page, with the following parameters.

Primary Analysis

Merging primary analysis