Introduction

Methrix provides set of function which allows easy importing of various flavors of bedgraphs generated by methylation callers, and many downstream analysis to be performed on large matrices.

This vignette describes basic usage of the package intended to process several large bedgraph files in R. In addition, a detailed exemplary complete data analysis with steps from reading in to annotation and differential methylation calling can be found in our WGBS best practices workflow

Overview and usage functions of the package

Installation

NOTE

Installation from BioConductor requires the BioC and R versions to be the newest. This arises from the restrictions imposed by BioConductor community which might cause package incompatibilities with the earlier versions of R (for e.g; R < 4.0). In that case installing from GitHub might be easier since it is much more merciful with regards to versions.

Reading bedgraph files

read_bedgraphs function is a versatile bedgraph reader intended to import bedgraph files generated virtually by any sort of methylation calling program. It requires user to provide indices for chromosome names, start position and other required fields. There are also presets available to import bedgraphs from most common programs such as Bismark, MethylDackel, and MethylcTools.

We can import bedgraph files with the function read_bedgraphs which reads in the bedgraphs, adds CpGs missing from the reference set, and creates a methylation/coverage matrices. Once the process is complete - it returns an object of class methrix which in turn inherits SummarizedExperiment class. methrix object contains ‘methylation’ and ‘coverage’ matrices (either in-memory or as on-disk HDF5 arrays) along with pheno-data and other basic info. This object can be passed to all downstream functions for various analysis.

Note: Use the argument pipeline if your bedgraphs are generated with “Bismark”, “MethylDeckal”, or “MethylcTools”. This will automatically figure out the file formats for you, and you dont have to use the arguments chr_idx start_idx and so..

HTML QC report

Get basic summary statistics of the methrix object with methrix_report function which produces an interactive html report

Click here for an example report.

Basic operations

Extract methylation/coverage matrices

SessionInfo

sessionInfo()
#> R version 4.2.0 RC (2022-04-19 r82224)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.15-bioc/R/lib/libRblas.so
#> LAPACK: /home/biocbuild/bbs-3.15-bioc/R/lib/libRlapack.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] bsseq_1.32.0                         MafDb.1Kgenomes.phase3.hs37d5_3.10.0
#>  [3] GenomicScores_2.8.0                  BSgenome.Hsapiens.UCSC.hg19_1.4.3   
#>  [5] BSgenome_1.64.0                      rtracklayer_1.56.0                  
#>  [7] Biostrings_2.64.0                    XVector_0.36.0                      
#>  [9] methrix_1.10.0                       SummarizedExperiment_1.26.0         
#> [11] Biobase_2.56.0                       GenomicRanges_1.48.0                
#> [13] GenomeInfoDb_1.32.0                  IRanges_2.30.0                      
#> [15] S4Vectors_0.34.0                     BiocGenerics_0.42.0                 
#> [17] MatrixGenerics_1.8.0                 matrixStats_0.62.0                  
#> [19] data.table_1.14.2                   
#> 
#> loaded via a namespace (and not attached):
#>   [1] colorspace_2.0-3              rjson_0.2.21                 
#>   [3] ellipsis_0.3.2                farver_2.1.0                 
#>   [5] bit64_4.0.5                   interactiveDisplayBase_1.34.0
#>   [7] AnnotationDbi_1.58.0          fansi_1.0.3                  
#>   [9] R.methodsS3_1.8.1             sparseMatrixStats_1.8.0      
#>  [11] cachem_1.0.6                  knitr_1.38                   
#>  [13] jsonlite_1.8.0                Rsamtools_2.12.0             
#>  [15] dbplyr_2.1.1                  png_0.1-7                    
#>  [17] R.oo_1.24.0                   shiny_1.7.1                  
#>  [19] HDF5Array_1.24.0              BiocManager_1.30.17          
#>  [21] compiler_4.2.0                httr_1.4.2                   
#>  [23] assertthat_0.2.1              Matrix_1.4-1                 
#>  [25] fastmap_1.1.0                 limma_3.52.0                 
#>  [27] cli_3.3.0                     later_1.3.0                  
#>  [29] htmltools_0.5.2               tools_4.2.0                  
#>  [31] gtable_0.3.0                  glue_1.6.2                   
#>  [33] GenomeInfoDbData_1.2.8        dplyr_1.0.8                  
#>  [35] rappdirs_0.3.3                Rcpp_1.0.8.3                 
#>  [37] jquerylib_0.1.4               vctrs_0.4.1                  
#>  [39] rhdf5filters_1.8.0            DelayedMatrixStats_1.18.0    
#>  [41] xfun_0.30                     stringr_1.4.0                
#>  [43] mime_0.12                     lifecycle_1.0.1              
#>  [45] restfulr_0.0.13               gtools_3.9.2                 
#>  [47] XML_3.99-0.9                  AnnotationHub_3.4.0          
#>  [49] zlibbioc_1.42.0               scales_1.2.0                 
#>  [51] promises_1.2.0.1              parallel_4.2.0               
#>  [53] rhdf5_2.40.0                  RColorBrewer_1.1-3           
#>  [55] yaml_2.3.5                    curl_4.3.2                   
#>  [57] memoise_2.0.1                 ggplot2_3.3.5                
#>  [59] sass_0.4.1                    stringi_1.7.6                
#>  [61] RSQLite_2.2.12                BiocVersion_3.15.2           
#>  [63] highr_0.9                     BiocIO_1.6.0                 
#>  [65] permute_0.9-7                 filelock_1.0.2               
#>  [67] BiocParallel_1.30.0           rlang_1.0.2                  
#>  [69] pkgconfig_2.0.3               bitops_1.0-7                 
#>  [71] evaluate_0.15                 lattice_0.20-45              
#>  [73] purrr_0.3.4                   Rhdf5lib_1.18.0              
#>  [75] GenomicAlignments_1.32.0      labeling_0.4.2               
#>  [77] bit_4.0.4                     tidyselect_1.1.2             
#>  [79] magrittr_2.0.3                R6_2.5.1                     
#>  [81] generics_0.1.2                DelayedArray_0.22.0          
#>  [83] DBI_1.1.2                     pillar_1.7.0                 
#>  [85] KEGGREST_1.36.0               RCurl_1.98-1.6               
#>  [87] tibble_3.1.6                  crayon_1.5.1                 
#>  [89] utf8_1.2.2                    BiocFileCache_2.4.0          
#>  [91] rmarkdown_2.14                locfit_1.5-9.5               
#>  [93] grid_4.2.0                    blob_1.2.3                   
#>  [95] digest_0.6.29                 xtable_1.8-4                 
#>  [97] httpuv_1.6.5                  R.utils_2.11.0               
#>  [99] munsell_0.5.0                 bslib_0.3.1