Htsflow by arnaudceol

Technical Documentation (WIKI) /htsflow/wiki

HTS-flow (High-Throughput Sequencing flow) provides a framework for the management and analysis of NGS data.

HTS-flow is based on a combination of a MySQL database, a PHP web interface and several NGS analysis modules. It allows labs generating NGS samples to analyze sequencing data and to manage the increasing size of their data repository.

HTS-flow facilitates the reproducibility and traceability of the analyses by avoiding manual, error-prone execution of a set of standard NGS tools.

The technical documentation (wiki) is available at: https://github.com/arnaudceol/htsflow/wiki.

The core of HTS-flow is a MySQL database with three main “entities”: sample description, primary and secondary analyses.

A primary analysis is performed on each type of raw data: quality controls, filtering, and alignment.
Higher-level (secondary) analysis can be performed on a group of samples according to the data type and user needs: peak calling, differential peak calling and saturation analysis for ChIP-seq; absolute and differential expression quantification for RNA-seq; determination of absolute and relative methylation levels, and identification of differentially methylated regions for high-throughput DNA methylation data, and Analysis of 4sU-seq and RNA-seq time-course data.

The analyses rely on predefined, easily customizable modular scripts to invoke the most common analysis steps.

A PHP-based web interface allows the users to run and follow the progression of the primary and secondary analysis. The user can decide which steps of the analysis have to be performed, and directly modify their specific settings, e.g. the maximum number of mismatches allowed in the alignment process.

The analysis can run on different types of clusters or on a single laptop or desktop.

The results of the analyses are saved in RDS format, which can be loaded within R, as well as genome browser ready formats (bigwig, bed). HTS-flow allows those results to be directly loaded into the Integrated Genome Browser.