ENCODE已经有非常成熟的pipeline了,会用就行了。
ENCODE-DCC
关于整个流程,也有非常详细的介绍。
ChIP-seq Data Standards and Processing Pipeline
ATAC-seq Data Standards and Prototype Processing Pipeline
还有比这更靠谱的pipeline吗,我敢说没有了。
安装问题:
grep: this version of PCRE is compiled without UTF support
grep -P 改为 grep -E就可以解决了,参考链接。
老版的conda用source activate,新版的用conda activate
source activate encode-chip-seq-pipeline
source deactivate conda deactivate
source activate encode-atac-seq-pipeline
ChIP-seq的control问题:
Guide: Getting Started with ChIP-Seq
省钱了,以后chip-seq不用做control了!用机器学习替代ChIP-seq中的control
ATAC-seq(都要用绝对路径,不然找不到文件;还需要指定输出目录,不然默认会输出到home目录)
echo "caper run /home/lizhixin/softwares/atac-seq-pipeline/atac.wdl -i /home/lizhixin/project2/ATAC-seq/ENCC/test.full.json --out-dir /home/lizhixin/project2/ATAC-seq/ENCC/result" | qsub -V -N ATAC -q large -l nodes=1:ppn=12,walltime=84:00:00,mem=120gb
这个PBS并行不是很灵活,只是针对云端并行。
ChIP-seq练习数据
download.list
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620204/SRR620204.sra ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620205/SRR620205.sra ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620206/SRR620206.sra ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620207/SRR620207.sra ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620208/SRR620208.sra ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620209/SRR620209.sra
fastq-dump
ls *sra |while read id; do ~/biosoft/sratoolkit/sratoolkit.2.6.3-centos_linux64/bin/fastq-dump –split-3 $id;done
Please find attached two sets of ENCC enhancers. Basically,
- "encc-enhancer.bed" is enhancers defined with H3K27ac & H3K4me1 activity
- "encc-enhancer-atac.bed" is enhancers defined with H3K27ac & H3K4me1 activity as well as open chromatin (ATAC-seq) signal summits.
The procedure to produce these files is as follow:
- Process ATAC-seq and ChIP-seq data using standard ENCODE pipeline (https://github.com/ENCODE-DCC/chip-seq-pipeline https://github.com/kundajelab/atac_dnase_pipelines) => Get narrowpeaks for ATAC-seq and ChIP-seq
- Following http://compbio.mit.edu/ChromHMM/ChromHMM_manual.pdf, binarized the peak files and run ChromHMM on histone marks and ATAC-seq
- Examine the resulting emission probability matrix (see emission.png), found that segment types E2 and E3 are more like enhancers (accessible or not)
- Extract E2+E3 as encc-enhancer.bed
- To get enhancer with unified length, further overlap encc-enhaner.bed with ATAC-seq summit (summit is the base with the highest level in a peak) and use +/-500 bp around the ATAC-seq summit as enhancers. => encc-enhancer-atac.bed
生信技能树的一些教程写得非常务实,入门有奇效,推荐。
参考链接:
一篇文章学会ChIP-seq分析(上)
一篇文章学会ChIP-seq分析(下)
第10篇:ATAC-Seq、ChIP-Seq、RNA-Seq整合分析(本系列完结,内附目录)