Methods
Single-read sequences were analyzed for quality and overrepresented adapter sequences with theFastQC tool. Quality filtering and adapter trimming were performed with the Trimmomatic tool v0.39. Read mapping on rat genome Rnor6 was performed with the STAR local alignment tool version 2.6.1. CAGE tag start sites (CTSS) for each sample were imputed using the PromoterPipeline aggregation script from C1 CAGE protocol. Peak clustering was performed with Decomposition-based Peak Identification (DPI1) tool. Bidirectional enhancers were identified using the pipeline by Andersson et al, 2014. Statistical significance of differential expression of TSS and CTSS peaks was calculated using the DeSeq2 tool.
Contents
Quality Control
For adapter clipping we used maximum mismatch count of 2, palindrome threshold of 30 and simple threshold of 10. For sliding window quality trimming we used window size of 4 nucleotides and average PHRED score over the window was required not less than 15. Leading and trailing nucleotides of each read were removed if their quality was less than 3. All reads with length less than 36 were discarded.
Genome, Annotation and Indexing
As a reference genome we used Rattus norvegicus Rnor_6.0 (build NCBI:GCA_000001895.4) “top level” assembly acquired from Ensembl v93. As a source of annotated spice junctions for genome indexing we used Ensembl v93 R. norvegicus genome annotation.
Read Mapping
For local alignment of the reads onto Rattus norvegicus genome Rnor_6.0 (build NCBI:GCA_000001895.4) “top level” assembly we initially allowed for multimap reads to map up to 20 loci. We also required minimum splice alignment overhang of 8 nt for de novo splice junctions (--alignSJoverhangMin 8) and 1 nt for splice junctions used from genome annotation (--alignSJDBoverhangMin 1), spliced alignments were filtered using output splice junctions (--outFilterType BySJout), for maximum number of mismatches per pair relative to read length default value of 0.04 was used (--outFilterMismatchNoverReadLmax 0.04) and minimum intron length was set to 20 nt (--alignIntronMin 20). [34]
CTSS Aggregation
All alignments were filtered for absence of “read unmapped” flag (flag value 4). Alignments that passed filtering were aggregated with PromoterPipeline tool version 2015.05.16, and resulting CTSS were sorted by chromosome and coordinate.
TSS Peaks
We clustered CTSS peaks with DPI1 in SPI mode in all samples of SOL and EDL muscles, respectively. Output peaks were filtered by maximum CTSS counts over the peak and maximum TPM normalized CTSS counts per peak. Thus, three sets of output peaks were obtained: unfiltered, ‘permissive’ (ctssMaxCounts3) and ‘robust’ (ctssMaxCounts11, ctssMaxTpm1).
Enhancers
We identified putative enhancers as bidirectional promoters using the Enhancers software [Andersson et al, 2014; doi:10.1038/nature12787]. To avoid false identification of intergenic enhancers we masked all loci proximate or including known TSS (+- 500 nt) and exons (+- 200 nt), according to genome annotation. Enhancers were searched among ‘permissive’ sets of TSS peaks of fast (EDL) and slow (soleus) muscle CTSS.
Annotation of TSS peaks to genes and enhancers
We annotated all TSS peaks to corresponding enhancers by the criteria of stradless coordinate intersection. To keep consistency with enhancers, which are extended by 200nt by the enhancers pipeline, we extended gene regions by 200nt from 3’ and 5’ ends before intersecting them strand-wise with the permissive set of TSS peaks. Intersected peaks were annotated to corresponding genes.
Differentially expressed peaks, genes and enhancers
We used DeSEQ2 software to estimate log2 fold change (LFC) of expression of each peak between control and case samples as well as P value as probability of rejecting the null hypothesis which stated that mean expression of a peak is equal between the case and control samples. The P values were adjusted using Bonferroni-Hochberg false discovery rate method. The set of differentially expressed peaks for each case was identified by applying the cutoffs of |LFC| > 1.25 and Padj < 0.05. We considered an enhancer or a gene differentially expressed if at least one of the peaks within the region of interest was identified as differentially expressed.
Gene set enrichment analysis (GSEA)
We performed gene set enrichment analysis using PANTHER Slim GO biological process database and webserver using Fisher exact test and Bonferroni P-value adjustment. For observatory GSEA analysis (ST11) we also used REVIGO web service with allowed similarity 0.5 (small set of results) against R. norvegicus GO database.