LoRDEC的使用

本文链接： https://blog.csdn.net/u010608296/article/details/102535069

参考

Laurent Bouri, Dominique Lavenier. Evaluation of long read error correction software. [Research
Report] RR-9028, INRIA Rennes - Bretagne Atlantique; GenScale. 2017. <hal-01463694>

2.6 LoRDEC
Introduction
LoRDEC is an hybrid corrector, using a de-Bruijn graph constructed from short reads to correct long reads.
Website : http://www.atgc-montpellier.fr/lordec/
Installation
LoRDEC is available on linux and requires Cmake 2.6+ and GCC 4.7+.
Import LoRDEC and the GATB library (http://gatb-core.gforge.inria.fr/) :

$ wget http://www.atgc-montpellier.fr/download/sources/lordec/LoRDEC-0.6.tar.gz
$ tar zxvf LoRDEC-0.6.tar.gz
$ cd LoRDEC-0.6
$ wget https://github.com/GATB/gatb-core/releases/download/v1.1.0/ \gatb-core-1.1.0-bin-Linux.tar.gz
$ tar zxvf gatb-core-1.1.0-bin-Linux.tar.gz

Modify the variable GATB VER from the Makele (1.1.0) Install LoRDEC

$ make
$ cd ..

Input data
LoRDEC requires short reads in FASTA or FASTQ le format and long reads in FASTA or FASTQ le format.

Pipeline
Run the long read error correction with the binary "lordec-correct":

$ lordec-correct -2 Illumina.fasta -k 19 -s 3 -i pacbio.fasta -o pacbio-corrected.fasta

• 2 : File of short reads.
• k : Size of the kmer used in the de-Bruijn graph
• s : Abundance threshold of a kmer to be considered correct
• i : Input le
• o : Output le

A series of steps is then performed in order to correct the long reads:
1. Construction of a de-Bruijn graph from the short reads
2. Suppression of k-mer with occurrence less than the s value
3. Choose an optimal path of the graph by calculating the edit distance between the path
and a region of long read.
Output data
The corrected sequences will be in the output le indicated after the "-o" parameter. The output le in FASTA format contains long reads. Corrected sequences are dened by uppercase letters while uncorrected sequences appears as lowercase letters. Lordec oers the possibility to remove the uncorrected sequences at the beginning and at the end of the long reading or to keep only the corrected sequences.

$ lordec-trim -i fichier_reads.fasta -o fichier_trim.fasta

• i : corrected reads le
• o : output le

$ loredec-trim-split -i fichier_reads.fasta -o fichier_trim_split.fasta

• i : corrected reads le
• o : output le

猜你喜欢