BINOCh
Binding Inference from Nucleosome Occupancy Changes

Sample H3K4me2 ChIP-seq nucleosome positioning data

Here we provide a sample data set from our paper:

"Nucleosome dynamics define transcriptional enhancers" Housheng Hansen He, Clifford A Meyer, et al, Nature Genetics, 42, p343–347, 2010.

To characterize the pattern of nucleosome positioning at enhancers, we used nucleosome-resolution ChIP-seq of H3K4me2 in the prostate cancer cell line LNCaP in response to stimulation by the AR agonist 5a-dihydrotestoterone (DHT). H3K4me2 nucleosome positioning data was generated for the vehicle control condition and the 4 hour DHT condition. Here we provide the bed file of Illumina sequence tags mapped to human reference genome hg18 for the vehicle condition and the 4 hour DHT condition. Using the 4 hour DHT bed file we discover nucleosome positions using the NPS nucleosome position discovery software.

Data preparation using the tagtab script

Using this data and the tagtab script provided with this package we generate a summary of tag counts in flanking and central nucleosome regions LNCaP_H3K4me2.xls. This file can be generated from tag bed files and the mononucleosome bed files using the tagtab script:
    tagtab LNCaP_DHT_peak.bed LNCaP_H3K4me2_veh.bed,LNCaP_H3K4me2_DHT.bed -o LNCaP_H3K4me2.xls
Here the file LNCaP_DHT_mono_peak.bed specifies the nucleosome positions and the sequence tag positions are specified by LNCaP_H3K4me2_veh.bed and LNCaP_H3K4me2_DHT.bed. All files MUST be sorted by genomic coordinate, chromosomes may appear in any order. The file generated, LNCaP_H3K4me2.xls is a table having the format:
    chromstart end veh_nuc veh_linDHT_4h_nuc DHT_4h_lin
    chr1828981 829526 53 27 108 81
    chr1829141 829711 36 30 110 48
    chr1830376 830861 59 17 159 56
    chr1830661 831231 48 31 125 110
    chr1831031 831581 31 4 44 8
    chr1842141 842721 17 15 47 36

The columns with headers chrom, start, and end contain the genomic locations of nucleosome pairs with midpoints separated by 250 to 450 basepairs. The midpoint separation parameter may be also be assigned using the optional parameters --minsep and --maxsep. Of the remaining column headers the ones that end in _nuc contain the counts of tags that fall in the nucleosome regions while those ending in _lin contain the tag counts for the regions between nucleosome pairs.

DNA motif identification using the positional analysis method of the binoch script

To discover DNA sequence motifs that are localized near the midpoint between paired nucleosomes with high nucleosome stablization-destabilization (NSD) scores we run the binoch script:

    binoch LNCaP_H3K4me2.xls -a pos -g hg18 --nuc=veh_nuc,DHT_4h_nuc --lin=veh_lin,DHT_4h_lin -o LNCaP_H3K4me2_DHT_pos.txt
Here the -a pos option is for the motif position analysis. The genome version is specified by the -g hg18 option. The columns containing tag counts in flanking nucleosome regions are specified by the header labels for the appropriate columns --nuc=veh_nuc,DHT_4h_nuc. Similarly the tag counts in the region between flanking nucleosomes is specified by the option --lin=veh_lin,DHT_4h_lin. The --nuc and --lin options are comma delimited lists that must by ordered in such a way that each column containing flanking nucleosome counts must be matched by a column containing tag counts from the central region.

The output from this analysis is in the format:

    ID sym consensus numhits mean cutoff zscore pval
    M00447AR AGTAC.T.WTGTTCT 166 -0.12 6.31 -5.06 2.11e-07
    M01012HNF3 .....TGTTTR....... 701 -0.06 5.40 -4.86 6.01e-07
    M00481AR GG.ACA...TGT.CT 387 -0.07 5.00 -4.82 7.09e-07
    M00956AR ......GG.AC....TGTTCT.... 232 -0.09 5.44 -4.75 1.00e-06
    M00953AR ......GG.ACA..GTGTTCT.... 233 -0.09 4.91 -4.60 2.16e-06

The columns have the following meanings:

DNA motif identification using the enrichment analysis method of the binoch script

To discover DNA sequence motifs that are enriched in the high NSD-scoring regions relative to those regions that have neutral NSD scores we run the binoch script:

    binoch LNCaP_H3K4me2.xls -a enrich -n 0.01 -g hg18 --nuc=veh_nuc,DHT_4h_nuc --lin=veh_lin,DHT_4h_lin -o LNCaP_H3K4me2_DHT_binom.txt
Here the -a enrich option is for the motif enrichment analysis. The -n 0.01 option specifies the fraction of the total number of paired nucleosome regions that is to be used in the comparison of top NSD-scoring and neutral NSD-scoring regions. The remaining options are as before.

    ID sym consensus numhits_fg numhits_bg n cutoff pval
    M00953 AR ......GG.ACA..GTGTTCT.... 257 141 862 5 6.65e-23
    M00481 AR GG.ACA...TGT.CT 441 332 862 5 3.52e-14
    M00956 AR ......GG.AC....TGTTCT.... 319 219 862 5 3.76e-14
    M00954 PR .......G.A.....TGTTCT.... 440 337 862 5 8.06e-13
    M01162 OG-2 TAATTG 643 552 862 5 2.14e-11

The columns have the following meanings:

Transcription factor motif libraries available with BINOCh

Several motif libraries are available for binoch analysis. These can be specified using the -m option:

Options available when using the binoch script

To use:

binoch [options] TABLEFILENAME

TABLEFILENAME can be either a .xls file or a .bed file .xls files regions are assumed to contain the paired nucleosome positions and are trimmed and padded in the position based analysis. .xls files can be generated using the tagtab script with nucleosome position .bed files and mapped ChIP-seq histone modification .bed files. .bed file regions are analysed with no sequence splicing in the position based analysis. If the option lin is unspecified scores will be computed from nuc only.

Type binoch -h for information on the options.

Options:

Registration is simple:

note: If you don't want to receive any email from the group, please remember to set the 'Delivery' type of your account as 'No Email'.

Google Groups Beta
Subscribe to BINOCh Announcement
Email:
Visit this group