Statistical inference on three-dimensional structure of genome by truncated Poisson architecture model.

*(English)*Zbl 1351.92036
Choudhary, Pankaj K. (ed.) et al., Ordered data analysis, modeling and health research methods. In honor of H. N. Nagaraja’s 60th birthday. Selected papers based on the presentations at the international conference, Austin, TX, USA, March 7–9, 2014. Cham: Springer (ISBN 978-3-319-25431-9/hbk; 978-3-319-25433-3/ebook). Springer Proceedings in Mathematics & Statistics 149, 245-261 (2015).

Summary: In recent years, next generation sequencing technology, coupled with an assay that is capable of detecting genome-wide chromatin interactions, has produced a massive amount of data and led to a greater understanding of long-range, or spatial, gene regulation mechanisms. Hence, the traditional one-dimensional linear view of a genome, which is especially prevalent in statistical and mathematical modeling, is inadequate in many genomic studies. Instead, it is essential, in studying genomic functions, to estimate the three-dimensional (3D) structure of a genome. The availability of genome-wide interaction data necessitates the development of analytical methods to recover the underlying 3D spatial chromatin structure, but challenges abound. One particular issue is the excess of zeros, especially with higher resolution, or inter-chromosomal, data. This leads to questions concerning the appropriateness of using the Poisson distribution to model such data. In this article, we introduce a truncated Poisson architecture model (tPAM) to directly model sequencing counts with many zeros. We carried out an extensive simulation study to evaluate tPAM and to compare its performance with an existing method that uses the Poisson distribution to model the counts. We applied tPAM to reconstruct the underlying 3D structures of two data sets, one of human and one of mouse, to demonstrate its utility. The analysis of the human data set considered chromosomes 14 and 22 jointly, thereby illustrating tPAM’s capability of analyzing inter-chromosomal data. On the other hand, the mouse analysis was focused on a region on chromosome 2 to evaluate tPAM’s performance for recovering structure with loci in different topologically associated domains.

For the entire collection see [Zbl 1337.92005].

For the entire collection see [Zbl 1337.92005].

##### MSC:

92D20 | Protein sequences, DNA sequences |

92B15 | General biostatistics |

62P10 | Applications of statistics to biology and medical sciences; meta analysis |

62F07 | Statistical ranking and selection procedures |

##### Software:

BayesDA
PDF
BibTeX
Cite

\textit{J. Park} and \textit{S. Lin}, in: Ordered data analysis, modeling and health research methods. In honor of H. N. Nagaraja's 60th birthday. Selected papers based on the presentations at the international conference, Austin, TX, USA, March 7--9, 2014. Cham: Springer. 245--261 (2015; Zbl 1351.92036)

Full Text:
DOI

##### References:

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.