BINOCh
Binding Inference from Nucleosome Occupancy Changes

How should nucleosome positions be defined ? Should they be based on the MNase data from the treatment condition, the control condition, or both ?

Transcription factor binding tends to stabilize the nucleosomes flanking the binding site. Prior to binding the flanking nucleosomes may not be detectable by nucleosome analysis software. For this reason treatment condition nucleosome positions should be included. We have observed, however, that most nucleosome positions do not change between treatment and control conditions. Increasing sequencing coverage by combining control and treatment reads can be done to increase the sensitivity of nucleosome detection.

A motif of interest has a significant p-value in the position based analysis (binoch -a pos), this same motif does not have a significant p-value in the enrichment analysis (binoch -a enrich). Which analysis should be believed ?

The position analysis establishes the statistical significance for binding relative to the position of the flanking nucleosomes. If there is no statistical significance by this criterion the motif is not associated with binding induced nucleosome stabilization. The position analysis may detect motifs that are abundant in the system in both control and treatment conditions. The enrichment test (binoch -a enrich) establishes whether or not a motif is enriched in the sites with large NSD scores relative to those with neutral scores. Statistical significance by both criteria indicates that the motif is located in a probable binding site relative to nucleosomes and is associated with changing nucleosome occupancy. If a motif is determined to be significant only by the enrichment criterion it is possible this result is an artifact.

Is there any reason to exclude promoter regions from the analysis ? How can this be done in BINOCh ?

Promoter regions are dynamic and active regions of the genome, often having strongly positioned nucleosomes. Transcription factor that bind to promoters may be different from those binding elsewhere in the genome. The result of these two factors may be that the signal in the NSD score profile for a enhancer targeting factor may be lost if enhancer regions and promoter regions are assessed in the same analysis. BINOCh allows the user to include or exclude any set of genomic regions. To exclude a set of regions the user needs to provide a .bed file, lets call it test.bed. When running tagtab the positions of regions in test.bed relative to the paired nucleosome regions are annotated using the option -a test.bed. Nucleosome pair regions near to one of these regions, for example within 2000 bases, can be excluded when running binoch using the options -f test.bed -l 2000 -u 9e9.

How to decide the cutoff of NSD scores to be used in the analysis ?

We have observed the NSD scores to follow distributions that closely follow normal distributions with few outliers. Within this distribution there are NSD scores reflecting changes in nucleosome occupancy due to transcription factor binding as well as fluctuations due to noise. Nevertheless sites with high NSD scores tend to be enriched in transcription factor binding and can be used to assess which factors are key to the regulation of a system. We recommend, as a rough guideline, using the top 10 % of paired nucleosome regions in the analyses, based on observed binding patterns in a limited number of systems.