ChromGene: gene-based modeling of epigenomic data

Illustration of ChromGene output.

ChromGene is a method for gene-based modeling of epigenomic data. 

Access the ChromeGene project on GitHub
Citation:
Jaroszewicz A, Ernst J.
ChromGene: gene-based modeling of epigenomic data.
Genome Biology, 24:203, 2023.

CSREP: A framework for group-wise summarization and comparison of chromatin state annotations

Illustration of CSREP output.

CSREP is a framework for group-wise summarization and comparison of chromatin state annotations.

Access the CSREP project on GitHub
Citation:
Vu H, Koch Z, Fiziev P, Ernst J.
A framework for group-wise summarization and comparison of chromatin state annotations.
Bioinformatics, 39:btac722, 2023.

LECIF: Learning Evidence of Conservation from Integrated Functional genomic annotations

Illustrating of LECIF output.

LECIF is a supervised machine learning method that learns a genome-wide score of evidence for conservation at the functional genomics level.

Access the LECIF project on GitHub

Citation:
Kwon SB, Ernst J.
Learning a genome-wide score of human-mouse conservation at the functional genomics level.
Nature Communications, 12:2495, 2021.

CNEP: Constrained Non-Exonic Predictor

Illustration of CNEP output.

CNEP is software for predicting constrained-non exonic bases from large scale epigenomic and transcription factor binding data.

Access the CNEP project on GitHub

Citation:
Grujic O, Phung TN, Kwon SB, Arneson A, Lee Y, Lohmueller KE, Ernst J.
Identification and characterization of constrained non-exonic bases lacking predictive epigenomic and transcription factor binding annotations.
Nature Communications, 11:6168, 2020.

χ-CNN: integrative approach for fine-mapping chromatin interactions

Illustration of X-CNN output.

χ-CNN is software that integrates epigenomic and transcription factor binding data with more coarsely mapped chromatin interactions to fine-map the most likely sources of interactions.

Access the χ-CNN project on GitHub

Citation:
Jaroszewicz A, Ernst J.
An Integrative Approach for Fine-Mapping Chromatin Interactions.
Bioinformatics, 36:1704-1711, 2020.

ConsHMM: systematic discovery of conservations states and single-nucleotide genome annotation

 ConsHMM Atlas: conservation state annotations for major genomes and human genetic variation

ConsHMM is software for discovering conservations states and annotating genomes at single nucleotide resolution based on them.

Access the ConsHMM project on GitHub

Citation:
Arneson A, Ernst J
Systematic discovery of conservation states for single-nucleotide annotation of the human genome.
Communications Biology, 248, 2019.

ChromTime: modeling spatio-temporal dynamics of chromatin marks

Illustration of ChromTime output.

ChromTime is software for modeling the spatio-temporal dynamics of chromatin marks over time allowing systematic detection of expansions, contractions, and steady regions of chromatin marks over time.

Access the ChromTime project on GitHub

Citation:
Fiziev P, Ernst J
ChromTime: modeling spatio-temporal dynamics of chromatin marks.
Genome Biology, 19:109, 2018.

SHARPR: Systematic High-resolution Activation and Repression Profiling with Reporter-tiling

Example of Sharpr tiling.

SHARPR is software for analyzing Massively Parallel Reporter Assay tiling designs allowing mapping at high resolution activating and repressive nucleotides across thousands of regulatory regions.

Citation:
Ernst J, Melnikov A, Zhang X, Wang L, Rogov P, Mikkelsen T, Kellis M.
Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions.
Nature Biotechnology, 34:1180-1190, 2016.

ChromImpute: Large-scale epigenome imputation

Example of ChromImpute output.

ChromImpute is software for large-scale systematic epigenome imputation. ChromImpute takes an existing compendium of epigenomic data and uses it to predict signal tracks for mark-sample combinations not experimentally mapped or to generate a potentially more robust version of data sets that have been mapped experimentally. ChromImpute bases its predictions on features from signal tracks of other marks that have been mapped in the target sample and the target mark in other samples with these features combined using an ensemble of regression trees.

Citation:
Ernst J, Kellis M.
Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.
Nature Biotechnology, 33:364-376, 2015.

  • ChromImpute software (v1.0.5; version log)
  • ChromImpute manual
  • Example data chr21 only of Roadmap Epigenomics compendium (~1GB)
     
  • Quick instructions on running ChromImpute on the example data (chr21 of eight primary marks from the Roadmap Epigenomics project):
    1. Install Java 1.6 or later if not already installed.
    2. Unzip the file ChromImpute.zip.
    3. Unzip the file EXAMPLE.zip and place in the ChromImpute directory.
    4. From a command line go to the directory in which ChromImpute.jar is installed.
    5. To try out ChromImpute imputing H3K9ac for sample E034 (Primary T cells from peripheral blood) based on pre-computed predictors enter the command:
    java -mx4000M -jar ChromImpute.jar Apply EXAMPLE/CONVERTEDDATADIR EXAMPLE/DISTANCEDIR EXAMPLE/PREDICTORDIR EXAMPLE/tier1_samplemarktable.txt EXAMPLE/hg19sizes_chr21.txt EXAMPLE/OUTPUTDATA E034 H3K9ac
    In ~20min this will generate a chr21_impute_E034_H3K9ac.wig.gz file in the directory EXAMPLE/OUTPUTDATA
     
  • In general the following main steps are applied to generate an imputation. The manual provides more detail and discusses additional options including parallelization options to make some steps more efficient.
    1. If the input signal data is not already available at the desired resolution, default assumed to be 25bp, then use the Convert command to convert the data to the desired resolution. For the provided the example data, the data is already provided at the desired resolution, but here is an example of a command that could be used to covert the data to the desired resolution if unconverted data was provided:
    java -mx4000M -jar ChromImpute.jar Convert EXAMPLE/INPUTDATADIR EXAMPLE/tier1_samplemarktable.txt EXAMPLE/hg19sizes_chr21.txt EXAMPLE/CONVERTEDDATADIR
    The data in the INPUTDATADIR directory should be in .bedgraph or .wig format. Each file is as an entry in the samplemarktable_example.txt. The file hg19sizes_chr21.txt specifies the chromosome(s) to include and their lengths and the output is written to the CONVERTEDDATADIR directory.

    2. Global distance between datasets should be computed with the ComputeGlobalDist command. For generating the distances included in the example data the following command was run:
    java -mx4000M -jar ChromImpute.jar ComputeGlobalDist EXAMPLE/CONVERTEDDATADIR EXAMPLE/tier1_samplemarktable.txt EXAMPLE/hg19sizes_chr21.txt EXAMPLE/DISTANCEDIR

    3. Generate the features for the training with the GenerateTrainData command. This is done separately for each mark of interest. For generating the H3K9ac training data for the example data this was done with the command:
    java -mx4000M -jar ChromImpute.jar GenerateTrainData EXAMPLE/CONVERTEDDATADIR EXAMPLE/DISTANCEDIR EXAMPLE/tier1_samplemarktable.txt EXAMPLE/hg19sizes_chr21.txt EXAMPLE/TRAINDATA H3K9ac

    4. Generate the trained predictors for a specific mark in a specific sample type of interest with the Train command. For generating the predictors for imputing H3K9ac in E034 for the example data this was done with the command:
    java -mx4000M -jar ChromImpute.jar Train EXAMPLE/TRAINDATA EXAMPLE/tier1_samplemarktable.txt EXAMPLE/PREDICTORDIR E034 H3K9ac

    5. Generate the imputed signal track with Apply command for the desired mark in the desired sample. To generate the imputed signal track for H3K9ac for sample E034 the command is:
    java -mx4000M -jar ChromImpute.jar Apply EXAMPLE/CONVERTEDDATADIR EXAMPLE/DISTANCEDIR EXAMPLE/PREDICTORDIR EXAMPLE/tier1_samplemarktable.txt EXAMPLE/hg19sizes_chr21.txt EXAMPLE/OUTPUTDATA E034 H3K9ac
     
  • The observed compendium of data and imputed signal data, peak calls on imputed data, and chromatin states based on imputed data (hg19) can be found linked from Integrative Analysis of 111 reference human epigenomes.
  • The full roadmap epigenomics observed data already converted in a form to be run in ChromImpute with necessary files can be found on the Integrative Analysis of 111 reference human epigenomes.
  • ChromImpute is described in:
    Ernst J, Kellis M. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissuesNature Biotechnology, 33:364-376, 2015.
  • Subscribe to a mailing list for announcements of new versions.
  • ChromImpute source code is available on GitHub.
  • Please contact Jason Ernst at jason.ernst@ucla.edu with any questions, comments, or bug reports.
  • Funding for ChromImpute provided by NSF CAREER Award #1254200 and an Alfred P. Sloan Fellowship to J.E. and by NIH grants RC1HG005334 and R01HG004037 to M.K.

ChromHMM: Chromatin state discovery and characterization

Example of ChromHMM output.

ChromHMM is software for learning and characterizing chromatin states. ChromHMM can integrate multiple chromatin datasets such as ChIP-seq data of various histone modifications to discover de novo the major re-occuring combinatorial and spatial patterns of marks. ChromHMM is based on a multivariate Hidden Markov Model that explicitly models the presence or absence of each chromatin mark. The resulting model can then be used to systematically annotate a genome in one or more cell types. By automatically computing state enrichments for large-scale functional and annotation datasets ChromHMM facilitates the biological characterization of each state. ChromHMM also produces files with genome-wide maps of chromatin state annotations that can be directly visualized in a genome browser.

Citation:
Ernst J, Kellis M.
ChromHMM: automating chromatin-state discovery and characterization.
Nature Methods, 9:215-216, 2012.

View a list of papers using ChromHMM

DREM: Dynamic Regulatory Events Miner

Illustration of DREM output.
Illustration of DREM output.

The Dynamic Regulatory Events Miner (DREM) allows one to model, analyze, and visualize transcriptional gene regulation dynamics. The method of DREM takes as input time series gene expression data and static or dynamic transcription factor-gene interaction data (e.g. ChIP-seq, ChIP-chip data), and produces as output a dynamic regulatory map. The dynamic regulatory map highlights major bifurcation events in the time series expression data and transcription factors potentially responsible for them.

Access the DREM project

Citation:
Ernst J, Vainas O, Harbison CT, Simon I, and Bar-Joseph Z.
Reconstructing dynamic regulatory maps.
Nature-EMBO Molecular Systems Biology, 3:74, 2007.

View a list of papers using DREM

STEM: Short Time-series Expression Miner

One of several illustrations of STEM output.
One of several illustrations of STEM output.
One of several illustrations of STEM output.

The Short Time-series Expression Miner (STEM) is a Java program for clustering, comparing, and visualizing short time series gene expression data from microarray experiments (~8 time points or fewer). STEM allows researchers to identify significant temporal expression profiles and the genes associated with these profiles and to compare the behavior of these genes across multiple conditions. STEM is fully integrated with the Gene Ontology (GO) database supporting GO category gene enrichment analyses for sets of genes having the same temporal expression pattern. STEM also supports the ability to easily determine and visualize the behavior of genes belonging to a given GO category or user defined gene set, identifying which temporal expression profiles were enriched for these genes. (Note: While STEM is designed primarily to analyze data from short time course experiments it can be used to analyze data from any small set of experiments which can naturally be ordered sequentially including dose response experiments.)

Access the STEM project

Citation:
Ernst J, Bar-Joseph Z
STEM: a tool for the analysis of short time series gene expression data.
BMC Bioinformatics, 7:191, 2006.

View a list of papers using STEM

SEREND: SEmi-supervised REgulatory Network Discoverer

Illustration of SEREND method.

The SEmi-supervised REgulatory Network Discoverer (SEREND) is a semi-supervised learning method that uses a curated database of verified transcriptional factor-gene interactions, DNA sequence binding motifs, and a compendium of gene expression data in order to make thousands of new predictions about transcription factor-gene interactions, including whether the transcription factor activates or represses the gene.

Access the SEREND project

Citation:
Ernst J, Beg QK, Kay KA, Balazsi G, Oltvai ZN, Bar-Joseph Z.
A Semi-Supervised Method for Predicting Transcription Factor-Gene Interactions in Escherichia coli.
PLoS Computational Biology 4: e1000044, 2008.