tagtab
script
tagtab
script provided with this package we generate a summary of tag counts in flanking and central nucleosome regions
LNCaP_H3K4me2.xls. This file can be generated from tag bed files and the mononucleosome bed files using the tagtab
script:
tagtab LNCaP_DHT_peak.bed LNCaP_H3K4me2_veh.bed,LNCaP_H3K4me2_DHT.bed -o LNCaP_H3K4me2.xls
LNCaP_DHT_mono_peak.bed
specifies the nucleosome positions and the sequence tag positions are specified by
LNCaP_H3K4me2_veh.bed
and LNCaP_H3K4me2_DHT.bed
. All files MUST be sorted by genomic coordinate, chromosomes may appear in any order.
The file generated, LNCaP_H3K4me2.xls
is a table having the format:
chrom | start | end | veh_nuc | veh_lin | DHT_4h_nuc | DHT_4h_lin |
---|---|---|---|---|---|---|
chr1 | 828981 | 829526 | 53 | 27 | 108 | 81 |
chr1 | 829141 | 829711 | 36 | 30 | 110 | 48 |
chr1 | 830376 | 830861 | 59 | 17 | 159 | 56 |
chr1 | 830661 | 831231 | 48 | 31 | 125 | 110 |
chr1 | 831031 | 831581 | 31 | 4 | 44 | 8 |
chr1 | 842141 | 842721 | 17 | 15 | 47 | 36 |
The columns with headers chrom, start, and end contain the genomic locations of nucleosome pairs with midpoints separated by 250 to 450 basepairs. The midpoint separation parameter may be also be assigned using the optional parameters --minsep
and --maxsep
. Of the remaining column headers the ones that end in _nuc contain the counts of tags that fall in the nucleosome regions while those ending in _lin contain the tag counts for the regions between nucleosome pairs.
binoch
script
To discover DNA sequence motifs that are localized near the midpoint between paired nucleosomes with high nucleosome stablization-destabilization (NSD) scores we run the binoch
script:
binoch LNCaP_H3K4me2.xls -a pos -g hg18 --nuc=veh_nuc,DHT_4h_nuc --lin=veh_lin,DHT_4h_lin -o LNCaP_H3K4me2_DHT_pos.txt
-a pos
option is for the motif position analysis. The genome version is specified by the -g hg18
option. The columns containing tag counts in flanking nucleosome regions are specified by the header labels for the appropriate columns --nuc=veh_nuc,DHT_4h_nuc
. Similarly the tag counts in the region between flanking nucleosomes is specified by the option --lin=veh_lin,DHT_4h_lin
. The --nuc
and --lin
options are comma delimited lists that must by ordered in such a way that each column containing flanking nucleosome counts must be matched by a column containing tag counts from the central region.
The output from this analysis is in the format:
ID | sym | consensus | numhits | mean | cutoff | zscore | pval |
---|---|---|---|---|---|---|---|
M00447 | AR | AGTAC.T.WTGTTCT | 166 | -0.12 | 6.31 | -5.06 | 2.11e-07 |
M01012 | HNF3 | .....TGTTTR....... | 701 | -0.06 | 5.40 | -4.86 | 6.01e-07 |
M00481 | AR | GG.ACA...TGT.CT | 387 | -0.07 | 5.00 | -4.82 | 7.09e-07 |
M00956 | AR | ......GG.AC....TGTTCT.... | 232 | -0.09 | 5.44 | -4.75 | 1.00e-06 |
M00953 | AR | ......GG.ACA..GTGTTCT.... | 233 | -0.09 | 4.91 | -4.60 | 2.16e-06 |
The columns have the following meanings:
binoch
script
To discover DNA sequence motifs that are enriched in the high NSD-scoring regions relative to those regions that have neutral NSD scores we run the binoch
script:
binoch LNCaP_H3K4me2.xls -a enrich -n 0.01 -g hg18 --nuc=veh_nuc,DHT_4h_nuc --lin=veh_lin,DHT_4h_lin -o LNCaP_H3K4me2_DHT_binom.txt
-a enrich
option is for the motif enrichment analysis. The -n 0.01
option specifies the fraction of the total number of paired nucleosome regions that is to be used in the comparison of top NSD-scoring and neutral NSD-scoring regions. The remaining options are as before.
ID | sym | consensus | numhits_fg | numhits_bg | n | cutoff | pval |
---|---|---|---|---|---|---|---|
M00953 | AR | ......GG.ACA..GTGTTCT.... | 257 | 141 | 862 | 5 | 6.65e-23 |
M00481 | AR | GG.ACA...TGT.CT | 441 | 332 | 862 | 5 | 3.52e-14 |
M00956 | AR | ......GG.AC....TGTTCT.... | 319 | 219 | 862 | 5 | 3.76e-14 |
M00954 | PR | .......G.A.....TGTTCT.... | 440 | 337 | 862 | 5 | 8.06e-13 |
M01162 | OG-2 | TAATTG | 643 | 552 | 862 | 5 | 2.14e-11 |
The columns have the following meanings:
binoch
analysis. These can be specified using the -m
option:
binoch
script
To use:
binoch [options] TABLEFILENAME
TABLEFILENAME
can be either a .xls
file or a .bed
file
.xls
files regions are assumed to contain the paired nucleosome positions and are trimmed and padded in the position based analysis.
.xls
files can be generated using the tagtab
script with nucleosome position .bed
files and mapped ChIP-seq histone modification .bed
files. .bed
file regions are analysed with no sequence splicing in the position based analysis.
If the option lin
is unspecified scores will be computed from nuc
only.
Type binoch -h
for information on the options.
Options:
-h, --help
Show this help message and exit.--nuc=NUC
Comma delimited list of fields, specified as names in the header of the .xls
input file, for tag counts in flanking nucleosomes under control and treatment conditions.--lin=LIN
Comma delimited list of fields for tag counts in central regions under control and treatment conditions.--annot=ANNOT
Comma delimited list of fields for annotation of regions.-a ANALYSIS, --analysis=ANALYSIS
ANALYSIS
can be pos
for central position based analysis, or enrich
for enrichment analysis comparing the number of motifs found in high or low scoring regions with neutral regions.-b, --bias
Nucleosome GC content bias correction. True or False (default:False). Only works with .xls input file format -g GENOME, --genome=GENOME
Version of genome under examination eg: hg17, hg18, hg19, mm7, mm8, mm9. The user may also specify a custom genome sequence with GENOME
set to 'user
'. -c CONS, --cons=CONS
If CONS
is set phastcons conservation scores will be used to filter out sequence that does not meet this threshold at single bp resolution. CONS
should be set between 0 and 1.-p, --print
Print only the NSD score for each region using the 1st nucleosome label as treatment.-n NTOP, --ntop=NTOP
The fraction of top scoring regions to be used in the analysis (between 0 and 0.5, 0.01 suggested). If an .xls
table is provided this fraction is based on the NSD score. For a bed file this is based on the val field. All regions are used by default. -f FIELD, --field=FIELD
Limit analysis to regions where this list is between bounds set by lists l and u repectively. For example if the input .xls file has a field TSS which is the distance from the site to the nearest TSS filter out regions within 1kb of the TSS using -f TSS -l 1000 -u 1e10
. -l LOW, --low=LOW
Option used with -f
option.-u UP, --up=UP
Option used with -f
option.-w SPAN, --width=SPAN
Span of window used in position based motif analysis (-a pos
option).-o OUTPUT, --output=OUTPUT
--markov=MARKOV
Markov background model 0,1 or 2. Default 1.-m KNOWN_MOTIFS, --known-motifs=KNOWN_MOTIFS
Name of input XML file containing known motifs, to be scanned against ChIP regions. Provide full path to motif.xml file eg. $HOME/local/lib/motifIO/PWM/transfac.xml
-t TEST, --test=TEST
Comma delimited list of motif names to test.Registration is simple:
note: If you don't want to receive any email from the group, please remember to set the 'Delivery' type of your account as 'No Email'.
![]() |
Subscribe to BINOCh Announcement |
Visit this group |