Home > Genomics > Main text

BEDTOOLS


Tag: genomics, bedtools

pybedtools

Python version of Bedtools to deal with bed format files.

read a bed file into class object

from pybedtools import BedTool

bed12 = '/Share/home/zhangqf7/gongjing/zebrafish/data/reference/gtf/xiongtl/Delete_coding_danRer_merge_transcript.200.bed'

BED = BedTool(bed12)

fields of each entry

chrom, start, end, name, score, strand attribute can be retrieved directly. All columns following 6 columns are stored in fields as element of list object.

class Interval(__builtin__.object)
 |  Class to represent a genomic interval.
 |
 |  Constructor::
 |
 |      Interval(chrom, start, end, name=".", score=".", strand=".", otherfields=None)
>>> i = BED[0]
>>>
>>> i.name
u'TCONS_00052481_- gene_id "XLOC_019131"; transcript_id "TCONS_00052481"; exon_number "8"; oId "CPAT16329"; tss_id "TSS29925";'
>>> i.chrom
u'NC_007118.6'
>>> i.start
71143916
>>> i.end
71166706
>>> i.score
u'.'
>>> i.strand
u'-'
>>> i.thickStart  # Not exactly same as definition in bed12 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'pybedtools.cbedtools.Interval' object has no attribute 'thickStart'

# list all fields and parse
>>> i.fields
[u'NC_007118.6', u'71143916', u'71166706', u'TCONS_00052481_- gene_id "XLOC_019131"; transcript_id "TCONS_00052481"; exon_number "8"; oId "CPAT16329"; tss_id "TSS29925";', u'.', u'-', u'.', u'.', u'.', u'8', u'888,101,130,221,36,107,1981,24', u'0,1065,1568,2078,3812,16675,18264,22766']

BEDtools

bed file region extend biostar

bedtools slop -i all_peaks2 -g /Share/home/zhangqf/database/GenomeAnnotation/size/hg38.genome.size -b 50 > all_peaks.ext50

Tool:    bedtools slop (aka slopBed)
Version: v2.26.0
Summary: Add requested base pairs of "slop" to each feature.

Usage:   bedtools slop [OPTIONS] -i <bed/gff/vcf> -g <genome> [-b <int> or (-l and -r)]

Options:
        -b      Increase the BED/GFF/VCF entry -b base pairs in each direction.
                - (Integer) or (Float, e.g. 0.1) if used with -pct.

        -l      The number of base pairs to subtract from the start coordinate.
                - (Integer) or (Float, e.g. 0.1) if used with -pct.

        -r      The number of base pairs to add to the end coordinate.
                - (Integer) or (Float, e.g. 0.1) if used with -pct.

        -s      Define -l and -r based on strand.
                E.g. if used, -l 500 for a negative-stranded feature,
                it will add 500 bp downstream.  Default = false.

        -pct    Define -l and -r as a fraction of the feature's length.
                E.g. if used on a 1000bp feature, -l 0.50,
                will add 500 bp "upstream".  Default = false.

        -header Print the header from the input file prior to results.

shuffleBed: segmentation fault, due to region restriction, also see bedtools google group discussion

# foreground bed file with regions
# columns: tx, start, end, tx, UTR3_start, UTR3_end, strand, element, *
$ cat window-anno_utr3.bed6
NM_001002873	2822	2851	NM_001002873	2845	2868	-	utr3	*

# run shuffleBed, cause error
# shuffle within 3UTR context
[zhangting@appa1 Figure2B-Dreme]$ 
shuffleBed -i window-anno_utr3.bed6 -chrom -g transcript-length.txt -incl utr3.bed > test.bed

/software/biosoft/software/MeRIP-PF/tools/BEDTools-Version-2.16.2/bin/shuffleBed: 
line 2: 19639 Segmentation fault      
(core dumped) ${0%/*}/bedtools shuffle "$@"

# 报错原因:
# 想要shuffle的region是2822-2851(长度=29),但是限定的3UTR区域是2845-2868(长度=23),
# 所以限定区域的长度小于原始的,在这个限制下不可能找到满足条件的shuffle region,所以报错。

If you link this blog, please refer to this page, thanks!
Post link:https://tsinghua-gongjing.github.io/posts/bedtools.html

Previous: 期望、方差、协方差、相关系数