Tutorial 1: Cross-modality representation of spatial ATAC mouse embryo samples

The spatial ATAC mouse embryo samples measured the spatial ATAC-seq profiles of samples from E12.5, E13.5 and E15.5.

[1]:
import warnings
import pandas as pd
import numpy as np
import scanpy as sc
import episcanpy.api as epi
from PRESENT import PRESENT_function

warnings.filterwarnings("ignore")
sc.set_figure_params(dpi=80, figsize=(4,4), facecolor="white")

Analyze E15.5-S1 mouse embryo sample

[2]:
adata = sc.read_h5ad("Data/E15_5_S1.h5ad")
print(adata)
AnnData object with n_obs × n_vars = 3274 × 248047
    obs: 'nCount_peaks', 'nFeature_peaks', 'nCount_RNA', 'nFeature_RNA', 'total', 'duplicate', 'chimeric', 'unmapped', 'lowmapq', 'mitochondrial', 'nonprimary', 'passed_filters', 'is__cell_barcode', 'excluded_reason', 'TSS_fragments', 'DNase_sensitive_region_fragments', 'enhancer_region_fragments', 'promoter_region_fragments', 'on_target_fragments', 'blacklist_region_fragments', 'peak_region_fragments', 'peak_region_cutsites', 'y', 'x', 'pixel_y', 'pixel_x', 'section', 'sample', 'age', 'clusters', 'annotated_integrated_cl'
    obsm: 'spatial'

Run PRESENT model

[3]:
adata = PRESENT_function(
    spatial_key = "spatial", ## obsm key under which to load the spatial matrix of spots
    adata_atac = adata, ## The ATAC raw fragment count matrix of spots in anndata.AnnData format
    peak_min_cells_fraction = 0.005, ## Minimum fraction of cells accessible required for a peak to pass filtering
    nclusters = adata.obs["clusters"].nunique(),
    device = "cuda" ## Device used for training: cuda or cpu
)
print(adata)
Loading data and parameters...
Input data has been loaded
Computing METIS partitioning...
Done!
Model training:  26%|██████▌                  | 26/100 [02:35<07:22,  5.98s/it, NLL_loss=0.0881, BNN_loss=0.102, MSE_loss=0.483, ES counter=20, ES patience=20]
Early stop the training process
 ImplicitModificationWarning:/home/lizhen/miniconda3/envs/PRESENT/lib/python3.9/site-packages/anndata/_core/anndata.py:121: Transforming to str index.
 TqdmWarning:/home/lizhen/miniconda3/envs/PRESENT/lib/python3.9/site-packages/tqdm/auto.py:21: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
Succeed to find 10 clusters at resolution 1.031
AnnData object with n_obs × n_vars = 3274 × 50
    obs: 'nCount_peaks', 'nFeature_peaks', 'nCount_RNA', 'nFeature_RNA', 'total', 'duplicate', 'chimeric', 'unmapped', 'lowmapq', 'mitochondrial', 'nonprimary', 'passed_filters', 'is__cell_barcode', 'excluded_reason', 'TSS_fragments', 'DNase_sensitive_region_fragments', 'enhancer_region_fragments', 'promoter_region_fragments', 'on_target_fragments', 'blacklist_region_fragments', 'peak_region_fragments', 'peak_region_cutsites', 'y', 'x', 'pixel_y', 'pixel_x', 'section', 'sample', 'age', 'clusters', 'annotated_integrated_cl', 'nb_features', 'leiden', 'LeidenClusters'
    uns: 'neighbors', 'leiden'
    obsm: 'spatial', 'embeddings'
    obsp: 'distances', 'connectivities'

Visualization

[4]:
sc.pp.neighbors(adata, use_rep="embeddings")
sc.tl.umap(adata)
sc.pl.umap(adata, color=["clusters", "LeidenClusters"], wspace=0.5)
_images/Tutorial1_spatialATAC-seq_representation_MouseEmbryo_8_0.png
[5]:
sc.pl.embedding(adata, basis="spatial", color=["clusters", "LeidenClusters"], wspace=0.5)
_images/Tutorial1_spatialATAC-seq_representation_MouseEmbryo_9_0.png

Analyze E12.5-S2 mouse embryo sample with a snATAC-seq mouse embryo reference data

Compared with the mouse embryo E15.5-S1 sample, E12.5-S2 sample suffers from low sequencing depth and signal-to-noise ratio. Here we illustrate the analysis with a mouse embryo snATAC-seq data consisting of cells from E12, E13 and E15

[6]:
adata = sc.read_h5ad("Data/E12_5_S2.h5ad")
print(adata)
AnnData object with n_obs × n_vars = 2234 × 248047
    obs: 'nCount_peaks', 'nFeature_peaks', 'nCount_RNA', 'nFeature_RNA', 'total', 'duplicate', 'chimeric', 'unmapped', 'lowmapq', 'mitochondrial', 'nonprimary', 'passed_filters', 'is__cell_barcode', 'excluded_reason', 'TSS_fragments', 'DNase_sensitive_region_fragments', 'enhancer_region_fragments', 'promoter_region_fragments', 'on_target_fragments', 'blacklist_region_fragments', 'peak_region_fragments', 'peak_region_cutsites', 'y', 'x', 'pixel_y', 'pixel_x', 'section', 'sample', 'age', 'clusters', 'annotated_integrated_cl'
    obsm: 'spatial'

Before fed into the model, we transform the format of region name into chromatin:start-end for further peak merging

[7]:
rdata = sc.read_h5ad("Data/rdata_snATAC_MouseEmbryo.h5ad")
## Transform the name of region into chromatin:start-end format
regions = []
for region in rdata.var_names:
    name, start, end = region.split("-")
    regions.append(name + ":" + start + "-" + end)
rdata.var.index = regions
print(rdata)
AnnData object with n_obs × n_vars = 1879 × 248047
    obs: 'nCount_peaks', 'nFeature_peaks', 'total', 'duplicate', 'chimeric', 'unmapped', 'lowmapq', 'mitochondrial', 'nonprimary', 'passed_filters', 'is__cell_barcode', 'excluded_reason', 'TSS_fragments', 'DNase_sensitive_region_fragments', 'enhancer_region_fragments', 'promoter_region_fragments', 'on_target_fragments', 'blacklist_region_fragments', 'peak_region_fragments', 'peak_region_cutsites', 'sample', 'logunique', 'seurat_clusters', 'annotated_integrated_cl'

Run PRESENT model

[8]:
adata = PRESENT_function(
    spatial_key = "spatial", ## obsm key under which to load the spatial matrix of spots
    adata_atac = adata, ## The ATAC raw fragment count matrix of spots in anndata.AnnData format
    rdata_atac = rdata, ## The ATAC raw fragment counts of reference data in anndata.AnnData format
    rdata_atac_anno="annotated_integrated_cl", ## rdata_atac.obs key under which to load the annotation
    peak_min_cells_fraction = 0.005, ## Minimum fraction of cells accessible required for a peak to pass filtering
    device = "cuda" ## Device used for training: cuda or cpu
)
print(adata)
Loading data and parameters...
Peak merged
Peak filtered
Matrix merged
Input data has been loaded
Construct pseudo bulk reference data...
DownSample 1 times to get enough pseudo bulks!
Finished, bulk_data's shape: (30, 72451)
Computing METIS partitioning...
Done!
Model training:  30%|███████▌                 | 30/100 [01:36<03:44,  3.20s/it, NLL_loss=0.069, BNN_loss=0.0519, MSE_loss=0.496, ES counter=20, ES patience=20]
Early stop the training process
 ImplicitModificationWarning:/home/lizhen/miniconda3/envs/PRESENT/lib/python3.9/site-packages/anndata/_core/anndata.py:121: Transforming to str index.
Succeed to find 10 clusters at resolution 0.750
AnnData object with n_obs × n_vars = 2234 × 50
    obs: 'nCount_peaks', 'nFeature_peaks', 'nCount_RNA', 'nFeature_RNA', 'total', 'duplicate', 'chimeric', 'unmapped', 'lowmapq', 'mitochondrial', 'nonprimary', 'passed_filters', 'is__cell_barcode', 'excluded_reason', 'TSS_fragments', 'DNase_sensitive_region_fragments', 'enhancer_region_fragments', 'promoter_region_fragments', 'on_target_fragments', 'blacklist_region_fragments', 'peak_region_fragments', 'peak_region_cutsites', 'y', 'x', 'pixel_y', 'pixel_x', 'section', 'sample', 'age', 'clusters', 'annotated_integrated_cl', 'nb_features', 'leiden', 'LeidenClusters'
    uns: 'neighbors', 'leiden'
    obsm: 'spatial', 'embeddings'
    obsp: 'distances', 'connectivities'

Visualization

[9]:
sc.pp.neighbors(adata, use_rep="embeddings")
sc.tl.umap(adata)
sc.pl.umap(adata, color=["clusters", "LeidenClusters"], wspace=0.5)
_images/Tutorial1_spatialATAC-seq_representation_MouseEmbryo_18_0.png
[10]:
sc.pl.embedding(adata, basis="spatial", color=["clusters", "LeidenClusters"], wspace=0.5)
_images/Tutorial1_spatialATAC-seq_representation_MouseEmbryo_19_0.png
[ ]: