Python version of Bedtools to deal with bed format files.
from pybedtools import BedTool
bed12 = '/Share/home/zhangqf7/gongjing/zebrafish/data/reference/gtf/xiongtl/Delete_coding_danRer_merge_transcript.200.bed'
BED = BedTool(bed12)
chrom, start, end, name, score, strand attribute can be retrieved directly. All columns following 6 columns are stored in fields as element of list object.
class Interval(__builtin__.object)
| Class to represent a genomic interval.
|
| Constructor::
|
| Interval(chrom, start, end, name=".", score=".", strand=".", otherfields=None)
>>> i = BED[0]
>>>
>>> i.name
u'TCONS_00052481_- gene_id "XLOC_019131"; transcript_id "TCONS_00052481"; exon_number "8"; oId "CPAT16329"; tss_id "TSS29925";'
>>> i.chrom
u'NC_007118.6'
>>> i.start
71143916
>>> i.end
71166706
>>> i.score
u'.'
>>> i.strand
u'-'
>>> i.thickStart # Not exactly same as definition in bed12
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'pybedtools.cbedtools.Interval' object has no attribute 'thickStart'
# list all fields and parse
>>> i.fields
[u'NC_007118.6', u'71143916', u'71166706', u'TCONS_00052481_- gene_id "XLOC_019131"; transcript_id "TCONS_00052481"; exon_number "8"; oId "CPAT16329"; tss_id "TSS29925";', u'.', u'-', u'.', u'.', u'.', u'8', u'888,101,130,221,36,107,1981,24', u'0,1065,1568,2078,3812,16675,18264,22766']
bedtools slop -i all_peaks2 -g /Share/home/zhangqf/database/GenomeAnnotation/size/hg38.genome.size -b 50 > all_peaks.ext50
Tool: bedtools slop (aka slopBed)
Version: v2.26.0
Summary: Add requested base pairs of "slop" to each feature.
Usage: bedtools slop [OPTIONS] -i <bed/gff/vcf> -g <genome> [-b <int> or (-l and -r)]
Options:
-b Increase the BED/GFF/VCF entry -b base pairs in each direction.
- (Integer) or (Float, e.g. 0.1) if used with -pct.
-l The number of base pairs to subtract from the start coordinate.
- (Integer) or (Float, e.g. 0.1) if used with -pct.
-r The number of base pairs to add to the end coordinate.
- (Integer) or (Float, e.g. 0.1) if used with -pct.
-s Define -l and -r based on strand.
E.g. if used, -l 500 for a negative-stranded feature,
it will add 500 bp downstream. Default = false.
-pct Define -l and -r as a fraction of the feature's length.
E.g. if used on a 1000bp feature, -l 0.50,
will add 500 bp "upstream". Default = false.
-header Print the header from the input file prior to results.
# foreground bed file with regions
# columns: tx, start, end, tx, UTR3_start, UTR3_end, strand, element, *
$ cat window-anno_utr3.bed6
NM_001002873 2822 2851 NM_001002873 2845 2868 - utr3 *
# run shuffleBed, cause error
# shuffle within 3UTR context
[zhangting@appa1 Figure2B-Dreme]$
shuffleBed -i window-anno_utr3.bed6 -chrom -g transcript-length.txt -incl utr3.bed > test.bed
/software/biosoft/software/MeRIP-PF/tools/BEDTools-Version-2.16.2/bin/shuffleBed:
line 2: 19639 Segmentation fault
(core dumped) ${0%/*}/bedtools shuffle "$@"
# 报错原因:
# 想要shuffle的region是2822-2851(长度=29),但是限定的3UTR区域是2845-2868(长度=23),
# 所以限定区域的长度小于原始的,在这个限制下不可能找到满足条件的shuffle region,所以报错。