ChIP-seq | ATAC-seq | 数据分析流程

ENCODE已经有非常成熟的pipeline了，会用就行了。

ENCODE-DCC

关于整个流程，也有非常详细的介绍。

ChIP-seq Data Standards and Processing Pipeline

ATAC-seq Data Standards and Prototype Processing Pipeline

还有比这更靠谱的pipeline吗，我敢说没有了。

安装问题：

grep: this version of PCRE is compiled without UTF support

grep -P 改为 grep -E就可以解决了，参考链接。

老版的conda用source activate，新版的用conda activate

source activate encode-chip-seq-pipeline

source deactivate
conda deactivate

source activate encode-atac-seq-pipeline

ChIP-seq的control问题：

Guide: Getting Started with ChIP-Seq

省钱了，以后chip-seq不用做control了！用机器学习替代ChIP-seq中的control　

suinleelab/AIControl.jl

ChIP-seq 分析原理

ChIP-seq阴阳-正负对照

ATAC-seq（都要用绝对路径，不然找不到文件；还需要指定输出目录，不然默认会输出到home目录）

echo "caper run /home/lizhixin/softwares/atac-seq-pipeline/atac.wdl -i /home/lizhixin/project2/ATAC-seq/ENCC/test.full.json --out-dir /home/lizhixin/project2/ATAC-seq/ENCC/result" | qsub -V -N ATAC -q large -l nodes=1:ppn=12,walltime=84:00:00,mem=120gb

这个PBS并行不是很灵活，只是针对云端并行。　　

ChIP-seq练习数据

download.list

ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620204/SRR620204.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620205/SRR620205.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620206/SRR620206.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620207/SRR620207.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620208/SRR620208.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620209/SRR620209.sra

fastq-dump

ls *sra |while read id; do ~/biosoft/sratoolkit/sratoolkit.2.6.3-centos_linux64/bin/fastq-dump –split-3 $id;done

Please find attached two sets of ENCC enhancers. Basically,

"encc-enhancer.bed" is enhancers defined with H3K27ac & H3K4me1 activity
"encc-enhancer-atac.bed" is enhancers defined with H3K27ac & H3K4me1 activity as well as open chromatin (ATAC-seq) signal summits.

The procedure to produce these files is as follow:

Process ATAC-seq and ChIP-seq data using standard ENCODE pipeline (https://github.com/ENCODE-DCC/chip-seq-pipeline https://github.com/kundajelab/atac_dnase_pipelines) => Get narrowpeaks for ATAC-seq and ChIP-seq
Following http://compbio.mit.edu/ChromHMM/ChromHMM_manual.pdf, binarized the peak files and run ChromHMM on histone marks and ATAC-seq
Examine the resulting emission probability matrix (see emission.png), found that segment types E2 and E3 are more like enhancers (accessible or not)
Extract E2+E3 as encc-enhancer.bed
To get enhancer with unified length, further overlap encc-enhaner.bed with ATAC-seq summit (summit is the base with the highest level in a peak) and use +/-500 bp around the ATAC-seq summit as enhancers. => encc-enhancer-atac.bed

生信技能树的一些教程写得非常务实，入门有奇效，推荐。

参考链接：

一篇文章学会ChIP-seq分析（上）

一篇文章学会ChIP-seq分析（下）

第10篇：ATAC-Seq、ChIP-Seq、RNA-Seq整合分析（本系列完结，内附目录）

ChIP-seq | ATAC-seq | 数据分析流程

猜你喜欢