Authors: Jianhai Zhang, Jordan Hayes, Le Zhang, Bing Yang, Wolf B. Frommer, Julia Bailey-Serres, and Thomas Girke

1 Summary

This Shiny app is an integrated implementation of the R/Bioconductor package spatialHeatmap. It is designed for interactively visualizing numeric values in biological assays (e.g. RNA-seq, microarray, qPCR) on an anotomical aSVG image. First, the core feature "Spatial Heatmap" maps numeric values of target assayed items (genes, proteins, metabolites, etc) to matching spatial features (cells, tissues, organs, etc) in the aSVG image. In this process, the values are translated into different colors and these colors are used to paint the spatial features. The resulting images are called spatial heatmaps (SHMs). Second, nearest neighbors of each target assayed item are selected by most similar expression profiles independently. All target items and their nearest neighbors are hierachically clustered and visualized as interactive matrix heatmap. Third, network modules are identified internally and the module containing a target item is displayed as interactive network graph.

In addition to the primary visualizaion, the Spatial Enrichment (SE) is specialized in identifying spatial feature-specific genes in RNA-seq count data.

This app is a general visualization tool, not limited to biological data. Operations on this app are expected to follow the order as each panel appears. Otherwise if errors arise, the webpage should be refreshed.

For a quick test, select one example under "Data & aSVGs" on the Landing Page, or select "customData", download the example data and aSVG, and upload them to respective fields.

2 Input

Step1: choose custom or default data sets There are pre-configured examples (e.g. mouse_Merkin) for demonstration, and the "customData" allows to upload aSVG file(s) and data matrix generated by users.

Step 2: upload custom data
2A: Upload formatted data matrix Upload the data matrix in tabular file where target samples should have matching spatial features (shapes) in the aSVG(s). Note, the file name should not contain parenthesis. E.g. arab_expr_example.txt is expected while arab_expr_example.txt(1).txt will raise errors. The separator in the tabular file can only be one of tab, space, comma, or semicolon.
2B: is column or row gene? Specify genes in row or column. In the data matrix where row and column names are gene IDs and sample/conditions respectively, the column names MUST follow these naming scheme: 1) A sample name is followed by double underscore then the condition. E.g. in "root_pGL2__hypoxia", "root_pGL2" is the sample (atrichoblast epidermis) and "hypoxia" is the condition. 2) The "__" is a reserved separator, so it cannot be used in sample or condition identifiers. 3) Each column name must be unique. To achieve such naming format, simple sample/conditions can be edited in a regular text editor/Excel, while if complex it can be generated with the function filter_data in "spatialHeatmap". One column of metadata (e.g. gene annotation) could be optinally appended to sample/condition at the end, where the column name should not include "__". Only values of samples having a matching feature counterparts in the aSVG are translated to colors in spatial heatmaps. In the case of spatial-temporal data, there are three factors: samples, conditions, and time points. The naming scheme is slightly different and includes three options: 1) combine samples and conditions to make the composite factor sample-condition, then concatenate the new factor and times with double underscore in between, i.e. "sampleCondition__time"; 2) combine samples and times to make the composite factor sample-time, then concatenate the new factor and conditions with double underscore in between, i.e. "sampleTime__condition"; or 3) combine all three factors to make the composite factor "sampleTimeCondition" without double underscore.

Step 3: upload custom aSVG(s)
3A: upload one aSVG file Upload an aSVG file generated by users, where the aSVG means spatial features are annotated with unique identifiers. An example aSVG is downloadable below. Note, the aSVG file name should not contain parenthesis. E.g. "arabidopsis.thaliana_root.cross_shm.svg" is expected while "arabidopsis.thaliana_root.cross_shm(1).svg" will cause errors.

3B (optional): upload multiple aSVG files Upload more than one aSVGs, such as aSVGs representing organs at different growth stages. The order of aSVGs should be indicated by suffixes of "_shm1", "_shm2", ... . e.g. "arabidopsis.thaliana_organ_shm1.svg", "arabidopsis.thaliana_organ_shm2.svg". The spatial heatmaps would be a composite image including all aSVGs. This step takes precedence over "Step 3A". A pre-uploaded example is "growthStage_Mustroph" under "Step 1".

More details about how to set up aSVG file and data matrix are provided in the package vignette and the SVG tutorial. The example aSVG files and formatted data matrices are provided on the landing page and can be uploaded directly for testing after selecting "customData".

Additional files
Upload a config file (optional) Optionally upload a custom configuration file in "yaml" format, where customized default parameters are set such as the color scheme, title size, etc. A specialized yaml file editor is recommended for editing this file, e.g. onlineyamltools.

Upload batched data, aSVGs in two separate tar files (optional) If there are a large amount of data and aSVG files to visualize, they can be compressed in two separate tar files and uploaded in a batch. See the function "write_hdf5" for details.

3 Matrix Heatmap

The nearest neighbors of each selected/target gene in the data matrix are selected with correlation or distance measure independently, then all target genes and their nearest neighbors are hierarchically clustered and visualized in an interactive matrix heatmap, where target genes are labeled by black lines. The interactive features include 1) mouse over a cell to see row/column labels and cell value, and 2) draw a rectangle to zoom in and double click to zoom out.

4 Network

This section applies network analysis on the subsetted matrix in the Matrix Heatmap with WGCNA (Peter Langfelder and Horvath 2008; P. Langfelder and Horvath 2012; Peter Langfelder, Zhang, and Steve Horvath 2016). Briefly, a correlation matrix or distance matrix is computed on all genes in Matrix Heatmap, and transformed to an adjacency matrix and topological overlap matrix (TOM) sequentially, which are advanced measures to quantify coexpression similarity. Then network modules are identified by hierarchinally clustering the TOM-transformed dissimilarity matrix 1-TOM, which are clusters of genes with highly similar coexpression profiles. The module containing a target gene is finally displayed as interactive network graph. Since this is a coexpression analysis, variables of sample/condition should be at least 5. Otherwise, resulting modules are not reliable. Refer to package vignette for details.

5 Spatial Enrichment

The Spatial Enrichment is specialized in detecting spatial feature-specific genes (SFSGs) in RNA-seq count data. Basically, the app compares a target feature with each reference feature by using tools of edgeR (McCarthy et al. 2012), DESeq2 (Love, Huber, and Anders 2014), limma (Ritchie et al. 2015), distinct (Tiberi et al., n.d.). The target and reference features, and tools are selected by users. If more than one tools are selected, the overlaps of identified SFSGs between all selected tools are presented. These SFSGs can be selected for visualization in the spatial heatmaps.

6 Other Information

Data Source
brain_Prudencio: RNA-seq, Prudencio et al. (2015), accessed through the R package ExpressionAtlas (Keays 2019).
mouse_Merkin: RNA-seq, Merkin et al. (2012), accessed through the R package ExpressionAtlas (Keays 2019).
chicken_Cardoso.Moreira: RNA-seq, Cardoso-Moreira et al. (2019), accessed through the R package ExpressionAtlas (Keays 2019).
shoot_Mustroph/organ_Mustroph/root_Mustroph/shootRoot_Mustroph/rootRootTip_Mustroph: microarray, Mustroph et al. (2009), accessed through the R package GEOquery (S. Davis and Meltzer 2007).
spatiotemporal_Narsai: RNA-seq, Narsai et al. (2017), accessed through the R package ExpressionAtlas (Keays 2019).
growthStage_Mustroph: random data.
map_Census: Bureau (2018).

Image Source
brain_Prudencio/mouse_Merkin/chicken_Cardoso.Moreira: Expression Atlas-EMBL-EBI
shoot_Mustroph/organ_Mustroph/root_Mustroph/shootRoot_Mustroph/rootRootTip_Mustroph/growthStage_Mustroph: Merkin et al. (2012).
spatiotemporal_Narsai: Narsai et al. (2017).
map_Census: Trip8.Co (2018).

Fund source This project has been funded by NSF awards: PGRP-1546879, PGRP-1810468, PGRP-1936492.

Reference

Bureau, U.S. Census. 2018. “Annual Population Estimates, Estimated Components of Resident Population Change, and Rates of the Components of Resident Population Change for the United States, States, and Puerto Rico: April 1, 2010 to July 1, 2018.” https://www.census.gov/data/datasets/time-series/demo/popest/2010s-state-total.html.

Cardoso-Moreira, Margarida, Jean Halbert, Delphine Valloton, Britta Velten, Chunyan Chen, Yi Shao, Angélica Liechti, et al. 2019. “Gene Expression Across Mammalian Organ Development.” Nature 571 (7766): 505–9.

Davis, Sean, and Paul Meltzer. 2007. “GEOquery: A Bridge Between the Gene Expression Omnibus (GEO) and BioConductor.” Bioinformatics 14: 1846–7.

Keays, Maria. 2019. ExpressionAtlas: Download Datasets from EMBL-EBI Expression Atlas.

Langfelder, P., and S. Horvath. 2012. “Fast R Functions for Robust Correlations and Hierarchical Clustering.” J. Stat. Softw. 46 (11). https://www.ncbi.nlm.nih.gov/pubmed/23050260.

Langfelder, Peter, and Steve Horvath. 2008. “WGCNA: An R Package for Weighted Correlation Network Analysis.” BMC Bioinformatics 9 (December): 559.

Langfelder, Peter, Bin Zhang, and with contributions from Steve Horvath. 2016. DynamicTreeCut: Methods for Detection of Clusters in Hierarchical Clustering Dendrograms. https://CRAN.R-project.org/package=dynamicTreeCut.

Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome Biology 15 (12): 550. doi:10.1186/s13059-014-0550-8.

McCarthy, Davis J., Chen, Yunshun, Smyth, and Gordon K. 2012. “Differential Expression Analysis of Multifactor RNA-Seq Experiments with Respect to Biological Variation.” Nucleic Acids Research 40 (10): 4288–97.

Merkin, Jason, Caitlin Russell, Ping Chen, and Christopher B Burge. 2012. “Evolutionary Dynamics of Gene and Isoform Regulation in Mammalian Tissues.” Science 338 (6114): 1593–9.

Mustroph, Angelika, M Eugenia Zanetti, Charles J H Jang, Hans E Holtan, Peter P Repetti, David W Galbraith, Thomas Girke, and Julia Bailey-Serres. 2009. “Profiling Translatomes of Discrete Cell Populations Resolves Altered Cellular Priorities During Hypoxia in Arabidopsis.” Proc Natl Acad Sci U S A 106 (44): 18843–8.

Narsai, Reena, David Secco, Matthew D Schultz, Joseph R Ecker, Ryan Lister, and James Whelan. 2017. “Dynamic and Rapid Changes in the Transcriptome and Epigenome During Germination and in Developing Rice ( Oryza Sativa ) Coleoptiles Under Anoxia and Re-Oxygenation.” Plant J. 89 (4): 805–24.

Prudencio, Mercedes, Veronique V Belzil, Ranjan Batra, Christian A Ross, Tania F Gendron, Luc J Pregent, Melissa E Murray, et al. 2015. “Distinct Brain Transcriptome Profiles in C9orf72-Associated and Sporadic ALS.” Nat. Neurosci. 18 (8): 1175–82.

Ritchie, Matthew E, Belinda Phipson, Di Wu, Yifang Hu, Charity W Law, Wei Shi, and Gordon K Smyth. 2015. “Limma Powers Differential Expression Analyses for RNA-sequencing and Microarray Studies.” Nucleic Acids Res. 43 (7): e47.

Tiberi, Simone, Helena L Crowell, Lukas M Weber, Pantelis Samartsidis, and Mark D Robinson. n.d. “Distinct : A Novel Approach to Differential Distribution Analyses.” BioRxiv.