ChIP-seq | ATAC-seq | 数据分析流程

ENCODE已经有非常成熟的pipeline了,会用就行了。

ENCODE-DCC

关于整个流程,也有非常详细的介绍。

ChIP-seq Data Standards and Processing Pipeline

ATAC-seq Data Standards and Prototype Processing Pipeline

还有比这更靠谱的pipeline吗,我敢说没有了。

安装问题:

grep: this version of PCRE is compiled without UTF support

grep -P 改为 grep -E就可以解决了,参考链接

老版的conda用source activate,新版的用conda activate

source activate encode-chip-seq-pipeline
source deactivate conda deactivate
source activate encode-atac-seq-pipeline

  

ChIP-seq的control问题:

Guide: Getting Started with ChIP-Seq

省钱了,以后chip-seq不用做control了!用机器学习替代ChIP-seq中的control 

suinleelab/AIControl.jl

ChIP-seq 分析原理

ChIP-seq阴阳-正负对照

ATAC-seq(都要用绝对路径,不然找不到文件;还需要指定输出目录,不然默认会输出到home目录)

echo "caper run /home/lizhixin/softwares/atac-seq-pipeline/atac.wdl -i /home/lizhixin/project2/ATAC-seq/ENCC/test.full.json --out-dir /home/lizhixin/project2/ATAC-seq/ENCC/result" | qsub -V -N ATAC -q large -l nodes=1:ppn=12,walltime=84:00:00,mem=120gb

   

这个PBS并行不是很灵活,只是针对云端并行。  

ChIP-seq练习数据

download.list

ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620204/SRR620204.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620205/SRR620205.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620206/SRR620206.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620207/SRR620207.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620208/SRR620208.sra
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP017/SRP017311/SRR620209/SRR620209.sra

fastq-dump

ls *sra |while read id; do ~/biosoft/sratoolkit/sratoolkit.2.6.3-centos_linux64/bin/fastq-dump –split-3 $id;done

Please find attached two sets of ENCC enhancers. Basically,

  • "encc-enhancer.bed" is enhancers defined with H3K27ac & H3K4me1 activity
  • "encc-enhancer-atac.bed" is enhancers defined with H3K27ac & H3K4me1 activity as well as open chromatin (ATAC-seq) signal summits.

The procedure to produce these files is as follow:

  1. Process ATAC-seq and ChIP-seq data using standard ENCODE pipeline (https://github.com/ENCODE-DCC/chip-seq-pipeline https://github.com/kundajelab/atac_dnase_pipelines) => Get narrowpeaks for ATAC-seq and ChIP-seq
  2. Following http://compbio.mit.edu/ChromHMM/ChromHMM_manual.pdf, binarized the peak files and run ChromHMM on histone marks and ATAC-seq
  3. Examine the resulting emission probability matrix (see emission.png), found that segment types E2 and E3 are more like enhancers (accessible or not)
  4. Extract E2+E3 as encc-enhancer.bed
  5. To get enhancer with unified length, further overlap encc-enhaner.bed with ATAC-seq summit (summit is the base with the highest level in a peak) and use +/-500 bp around the ATAC-seq summit as enhancers. => encc-enhancer-atac.bed

生信技能树的一些教程写得非常务实,入门有奇效,推荐。

参考链接:

一篇文章学会ChIP-seq分析(上)

一篇文章学会ChIP-seq分析(下)

第10篇:ATAC-Seq、ChIP-Seq、RNA-Seq整合分析(本系列完结,内附目录)

猜你喜欢

转载自www.cnblogs.com/leezx/p/11927288.html