{ "cells": [ { "cell_type": "markdown", "id": "16770537", "metadata": {}, "source": [ "## Quantify gene imputation accuracy" ] }, { "cell_type": "markdown", "id": "3fc9b6cf", "metadata": {}, "source": [ "### Gene Expression Evaluation\n", "\n", "The module also provides gene expression evaluation metrics with rasterization support for both 2D and 3D data.\n", "\n", "#### Gene Expression Metrics\n", "\n", "- **Masked MSE/MAE**: Mean Squared/Absolute Error on regions with expression\n", "- **Cosine Similarity**: Mean cosine similarity (masked and unmasked)\n", "- **Spearman Correlation**: Mean Spearman correlation (masked and unmasked)\n", "- **Pearson Correlation**: Mean Pearson correlation (masked and unmasked)\n", "- **IoU Mask**: Intersection over Union of expression masks" ] }, { "cell_type": "code", "execution_count": 1, "id": "3286eecb", "metadata": {}, "outputs": [], "source": [ "import unist\n", "import scanpy as sc" ] }, { "cell_type": "markdown", "id": "3e791130", "metadata": {}, "source": [ "### 2D" ] }, { "cell_type": "code", "execution_count": 2, "id": "9232c500", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 8382 × 17649\n", " obs: 'area', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_counts'\n", " obsm: 'X_pca', 'bbox', 'spatial'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## read in data\n", "\n", "adata_path = '/Users/shuilan/Library/CloudStorage/OneDrive-InsideMDAnderson/SUICA/data/preprocessed_data/slice_440.h5ad'\n", "gt = sc.read(adata_path)\n", "\n", "gt" ] }, { "cell_type": "markdown", "id": "c838c91b", "metadata": {}, "source": [ "This result was obtained from Gene Imputation Module:\n", "\n", "https://unist-tutorial.readthedocs.io/en/latest/gene_imputation.html" ] }, { "cell_type": "code", "execution_count": null, "id": "ce7e4a82", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 16764 × 1\n", " obsm: 'fitted_embd', 'reconstructed_raw', 'spatial', 'spatial_normalized'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res = sc.read('/Users/shuilan/Library/CloudStorage/OneDrive-InsideMDAnderson/SUICA/data/preprocessed_data/reconstructed-original.h5ad')\n", "res" ] }, { "cell_type": "code", "execution_count": 9, "id": "db0b9144", "metadata": {}, "outputs": [], "source": [ "### put the gene expression data into res.X\n", "\n", "import anndata as ad\n", "from scipy import sparse\n", "\n", "res_new = ad.AnnData(\n", " X=sparse.csr_matrix(res.obsm[\"reconstructed_raw\"]),\n", " obs=res.obs.copy(),\n", " obsm=res.obsm.copy()\n", ")\n", "res_new.var_names = gt.var_names.copy()" ] }, { "cell_type": "code", "execution_count": 11, "id": "dbc645eb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 16764 × 17649\n", " obsm: 'fitted_embd', 'reconstructed_raw', 'spatial', 'spatial_normalized'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "res_new" ] }, { "cell_type": "code", "execution_count": 5, "id": "5bab6708", "metadata": {}, "outputs": [], "source": [ "from unist.metrics import evaluate_gene_expression_2d" ] }, { "cell_type": "markdown", "id": "52e3f45a", "metadata": {}, "source": [ "#### 10 micronmeters resolution" ] }, { "cell_type": "markdown", "id": "0223262e", "metadata": {}, "source": [ "##### All genes" ] }, { "cell_type": "code", "execution_count": null, "id": "1680f2b9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MSE (masked): 3.8159\n", "Pearson (masked): 0.2770\n" ] } ], "source": [ "metrics_2d = evaluate_gene_expression_2d(\n", " gt, res_new,\n", " bin_size=20.0, # corresponding to 10 micronmeters resolution\n", " true_coords_key=\"spatial\",\n", " pred_coords_key=\"spatial\"\n", ")\n", "\n", "print(f\"MSE (masked): {metrics_2d['MSE_masked']:.4f}\")\n", "print(f\"Pearson (masked): {metrics_2d['Pearson_masked']:.4f}\")" ] }, { "cell_type": "code", "execution_count": 7, "id": "b2336a10", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'bins': 33741,\n", " 'genes': 17649,\n", " 'bin_size': 20.0,\n", " 'IoU_mask': 0.3986876827615719,\n", " 'MSE_masked': 3.8158515045623447,\n", " 'MAE_masked': 1.234025159761077,\n", " 'Cosine_unmasked': 0.19846487198672683,\n", " 'Cosine_masked': 0.6738546518423421,\n", " 'Spearman_unmasked': 0.1768471134755618,\n", " 'Spearman_masked': 0.23622424796329358,\n", " 'Pearson_unmasked': 0.18163100396300597,\n", " 'Pearson_masked': 0.27697253250828124}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metrics_2d" ] }, { "cell_type": "markdown", "id": "3f06bc7d", "metadata": {}, "source": [ "#### Specific genes" ] }, { "cell_type": "code", "execution_count": 16, "id": "022818a2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.7261713809819303 0.8778019861304599 -0.10784411082508727\n", "Myl2 1.3379105355218053\n", "Myl7 2.304951690893242\n", "Tnnt2 0.9730411383810393\n" ] } ], "source": [ "from unist.metrics import evaluate_specific_genes_2d\n", "\n", "result = evaluate_specific_genes_2d(\n", " gt,\n", " res_new,\n", " gene_names=[\"Myl2\", \"Myl7\", \"Tnnt2\"], # or just \"Myl2\"\n", " bin_size=20.0,\n", " true_coords_key=\"spatial\",\n", " pred_coords_key=\"spatial\",\n", ")\n", "\n", "# overall metrics(for all designated genes)\n", "overall = result[\"overall\"]\n", "print(overall[\"MAE_masked\"], overall[\"Cosine_masked\"], overall[\"Spearman_masked\"])\n", "\n", "# per_gene\n", "if \"per_gene\" in result:\n", " for gene, metrics in result[\"per_gene\"].items():\n", " print(gene, metrics[\"MAE_masked\"])" ] }, { "cell_type": "markdown", "id": "718e95d0", "metadata": {}, "source": [ "### 3D" ] }, { "cell_type": "markdown", "id": "b3e65c01", "metadata": {}, "source": [ "This step is computationally intensive; we recommend running it on a machine with sufficient memory.\n", "\n", "Here I test on a subset of the volume which contains 4 slices and 1 gene." ] }, { "cell_type": "code", "execution_count": 4, "id": "7d746059", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 624829 × 28943\n", " obs: 'cell_ID_mask', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_reads', 'reads_per_counts', 'n_joined', 'exact_entropy', 'theoretical_entropy', 'exact_compression', 'theoretical_compression', 'n_counts', 'annotation', 'annotation_key', 'n_section'\n", " var: 'n_cells_by_counts', 'mean_counts', 'total_counts', 'pct_dropout_by_counts'\n", " uns: 'log1p'\n", " obsm: 'spatial', 'spatial_3d_aligned', 'spatial_3d_aligned_rot-12'\n", " layers: 'raw'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## read in OpenST data\n", "gt_3D = sc.read('/Users/shuilan/Library/CloudStorage/OneDrive-InsideMDAnderson/3D spatial transcriptomics/data/3D/a_chopped.h5ad')\n", "gt_3D" ] }, { "cell_type": "code", "execution_count": 6, "id": "7d130780", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 57.97101449, 86.95652174, 115.94202899, 144.92753623,\n", " 173.91304348, 202.89855072, 260.86956522, 318.84057971,\n", " 492.75362319, 521.73913043, 550.72463768, 666.66666667,\n", " 695.65217391, 724.63768116, 753.62318841, 811.5942029 ,\n", " 956.52173913, 985.50724638, 1043.47826087])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "np.unique(gt_3D.obsm['spatial_3d_aligned_rot-12'][:, 2])" ] }, { "cell_type": "code", "execution_count": 7, "id": "0b9c83a1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 129203 × 28943\n", " obs: 'cell_ID_mask', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_reads', 'reads_per_counts', 'n_joined', 'exact_entropy', 'theoretical_entropy', 'exact_compression', 'theoretical_compression', 'n_counts', 'annotation', 'annotation_key', 'n_section'\n", " var: 'n_cells_by_counts', 'mean_counts', 'total_counts', 'pct_dropout_by_counts'\n", " uns: 'log1p'\n", " obsm: 'spatial', 'spatial_3d_aligned', 'spatial_3d_aligned_rot-12'\n", " layers: 'raw'" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coords = gt_3D.obsm[\"spatial_3d_aligned_rot-12\"]\n", "z_vals = coords[:, 2]\n", "target_slices = np.sort(np.unique(z_vals))[:4]\n", "mask = np.isin(z_vals, target_slices)\n", "\n", "gt_3D_4slices = gt_3D[mask].copy()\n", "gt_3D_4slices\n" ] }, { "cell_type": "code", "execution_count": 14, "id": "9090d387", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 129203 × 1\n", " obs: 'cell_ID_mask', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_reads', 'reads_per_counts', 'n_joined', 'exact_entropy', 'theoretical_entropy', 'exact_compression', 'theoretical_compression', 'n_counts', 'annotation', 'annotation_key', 'n_section'\n", " var: 'n_cells_by_counts', 'mean_counts', 'total_counts', 'pct_dropout_by_counts'\n", " uns: 'log1p'\n", " obsm: 'spatial', 'spatial_3d_aligned', 'spatial_3d_aligned_rot-12'\n", " layers: 'raw'" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gt_3D_4slices = gt_3D_4slices[:, [\"LYZ\"]].copy()\n", "gt_3D_4slices" ] }, { "cell_type": "code", "execution_count": 8, "id": "e9be33fd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 831406 × 1\n", " obsm: 'fitted_embd', 'reconstructed_raw', 'spatial', 'spatial_normalized'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## This is the reconstructed 3D data with one gene (LYZ)\n", "res_3d = sc.read('/Users/shuilan/Library/CloudStorage/OneDrive-InsideMDAnderson/3D spatial transcriptomics/data/gene_expression/LYZ_3d-reconstructed-custom-3d-original-coords.h5ad')\n", "res_3d" ] }, { "cell_type": "code", "execution_count": null, "id": "8d6810c3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 57.97101449, 87.83488112, 117.69870859, 147.56257523,\n", " 177.42644186, 207.29030849, 237.15415554, 267.01800259,\n", " 296.88186922, 326.74571627, 356.6095829 , 386.47342995,\n", " 416.33728679, 446.20114363, 476.06499558, 505.92885242,\n", " 535.79271048, 565.6565661 , 595.52042294, 625.38427978,\n", " 655.24813173, 685.11198857, 714.97584541, 744.83969246,\n", " 774.70355909, 804.56740614, 834.43127277, 864.29511982,\n", " 894.15896688, 924.02283351, 953.88670014, 983.75056677,\n", " 1013.61439424, 1043.47826087])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.unique(res_3d.obsm['spatial'][:, 2]) ## densified z-axis" ] }, { "cell_type": "code", "execution_count": 9, "id": "60ad9145", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 126919 × 1\n", " obsm: 'fitted_embd', 'reconstructed_raw', 'spatial', 'spatial_normalized'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coords = res_3d.obsm[\"spatial\"]\n", "z_vals = coords[:, 2]\n", "target_slices = np.sort(np.unique(z_vals))[:4]\n", "mask = np.isin(z_vals, target_slices)\n", "\n", "res_3d_4slices = res_3d[mask].copy()\n", "res_3d_4slices" ] }, { "cell_type": "code", "execution_count": 11, "id": "ee35153c", "metadata": {}, "outputs": [], "source": [ "from scipy import sparse\n", "res_3d_4slices.X = sparse.csr_matrix(res_3d_4slices.obsm[\"reconstructed_raw\"])" ] }, { "cell_type": "code", "execution_count": 12, "id": "09f085de", "metadata": {}, "outputs": [], "source": [ "res_3d_4slices.var_names = [\"LYZ\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "bf860d26", "metadata": {}, "outputs": [], "source": [ "from unist.metrics import evaluate_specific_genes_3d\n", "\n", "result = evaluate_specific_genes_3d(\n", " gt_3D_4slices,\n", " res_3d_4slices,\n", " gene_names=[\"LYZ\"],\n", " bin_size=60.0, # corresponding to 20 micronmeters resolution to reduce computational cost\n", " true_coords_key=\"spatial_3d_aligned_rot-12\",\n", " pred_coords_key=\"spatial\",\n", ")" ] }, { "cell_type": "code", "execution_count": 18, "id": "f1d0c489", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MSE (masked): 0.1200\n", "MAE (masked): 0.2656\n" ] } ], "source": [ "overall = result[\"overall\"]\n", "\n", "print(f\"MSE (masked): {overall['MSE_masked']:.4f}\")\n", "print(f\"MAE (masked): {overall['MAE_masked']:.4f}\")" ] } ], "metadata": { "kernelspec": { "display_name": "unist", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.19" } }, "nbformat": 4, "nbformat_minor": 5 }