Difference between revisions of "Merging CAGE experiments"
Ivan@dote.ru (talk | contribs) (→Further directions) |
Ivan@dote.ru (talk | contribs) |
||
Line 3: | Line 3: | ||
===Problem statement=== | ===Problem statement=== | ||
+ | Transcription of genes begins at genomic positions called transcription start sites (TSS). | ||
+ | CAGE is a high-throughput transcriptome analysis technique that can identify active TSSs one base resolution and their relative activities. | ||
+ | It was shown by CAGE method that different sets of TSSs can operate under different conditions, and that transcription can start from several closely spaced TSSs within the promoter. | ||
+ | All this complicates the comparative analysis of CAGE experiments carried out in different conditions. | ||
+ | We have developed a method that allow us to combine independent CAGE experiments and obtain a pooled set of TSSs with accurately defined boundaries. | ||
+ | Iterative application of this method to a large set of CAGE experiments allows the construction of a reference TSS set. | ||
+ | The presence of such a reference set makes it easy to compare TSS activities in different experiments, as well as to identify previously unknown TSS in the incoming data. | ||
+ | |||
+ | ===Algorithm overview=== | ||
+ | |||
[[File:CAGE_merging_simple_overlap_strategy.png|Simple overlap strategy]] | [[File:CAGE_merging_simple_overlap_strategy.png|Simple overlap strategy]] | ||
[[File:CAGE_merging_overhangs.png|CAGE peak overhangs]] | [[File:CAGE_merging_overhangs.png|CAGE peak overhangs]] | ||
Line 9: | Line 19: | ||
[[File:CAGE_merging_merge_reference_peaks.png|Merge reference peaks]] | [[File:CAGE_merging_merge_reference_peaks.png|Merge reference peaks]] | ||
− | + | ||
[[File: Algorithm_segments.png|Segmentation of CAGE peaks]] | [[File: Algorithm_segments.png|Segmentation of CAGE peaks]] | ||
Revision as of 15:34, 4 March 2021
Merging CAGE experiments. This page describes the problem of merging independent CAGE-seq experiments and approaches to solving it.
Problem statement
Transcription of genes begins at genomic positions called transcription start sites (TSS). CAGE is a high-throughput transcriptome analysis technique that can identify active TSSs one base resolution and their relative activities. It was shown by CAGE method that different sets of TSSs can operate under different conditions, and that transcription can start from several closely spaced TSSs within the promoter. All this complicates the comparative analysis of CAGE experiments carried out in different conditions. We have developed a method that allow us to combine independent CAGE experiments and obtain a pooled set of TSSs with accurately defined boundaries. Iterative application of this method to a large set of CAGE experiments allows the construction of a reference TSS set. The presence of such a reference set makes it easy to compare TSS activities in different experiments, as well as to identify previously unknown TSS in the incoming data.