Matching Observations to Simulations

Author

NumCosmo developers

Purpose

This document provides a guide to using NumCosmo’s sky_match module, which employs the SkyMatch class to match objects in the sky.

The SkyMatch class is designed to associate detected objects, such as galaxy clusters or halos, with observational data. A common application is matching cluster detections to halos in simulations. This process enables the evaluation of completeness and purity in observations.

In this tutorial, we generate mock data for galaxy clusters and halos, then use the SkyMatch class to perform matching and analyze the results.

Defining Theoretical Models

Cosmological Model

To begin, we define a cosmological model using NumCosmo’s background cosmology models. We also create an Nc.Distance object to compute the cosmological distances required for the weak lensing analysis.

For this example, we consider a Lambda Cold Dark Matter (\(\Lambda\)CDM) cosmology with the following parameters:

  • \(\Omega_{c0} = 0.25\): Cold dark matter density parameter at present
  • \(\Omega_{b0} = 0.05\): Baryon density parameter at present
  • \(\Omega_{k0} = 0\): Curvature density parameter at present
  • \(H_0 = 70.0\): Hubble constant in km/s/Mpc
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from astropy.table import Table
import pandas as pd
from IPython.display import HTML

from numcosmo_py import Nc, Ncm
from numcosmo_py.sky_match import (
    BestCandidates,
    Coordinates,
    DistanceMethod,
    SelectionCriteria,
    SkyMatch,
    SkyMatchResult,
)

Ncm.cfg_init()

Omega_b = 0.05
Omega_c = 0.25
Omega_k = 0.0
H0 = 70.0

# Create a cosmology object
cosmo = Nc.HICosmoDEXcdm.new()
cosmo.omega_x2omega_k()
cosmo["Omegab"] = Omega_b
cosmo["Omegac"] = Omega_c
cosmo["Omegak"] = Omega_k
cosmo["H0"] = H0
cosmo["w"] = -1.0

dist = Nc.Distance.new(100.0)
dist.compute_inv_comoving(True)
dist.prepare(cosmo)

# Lets fix the numpy seed to get reproducible results
np.random.seed(74682)

Generating Mock Data

To illustrate the matching process, we generate simulated mock data for galaxy clusters and halos.
While these configurations do not represent typical observational distributions, they are chosen to ensure the figures remain clear and readable.

We use a simple random distribution to assign positions, redshifts, and masses to both clusters and halos:

  • Positions:
    • Defined by right ascension (RA) and declination (Dec) in degrees.
    • RA is uniformly sampled in the range \([-10^\circ, 10^\circ]\).
    • Dec is sampled uniformly in \(\sin(\text{Dec})\) and converted back to degrees to ensure uniform coverage on the sphere.
  • Redshifts:
    • Uniformly distributed between 0.2 and 0.5.
  • Masses:
    • Sampled uniformly in \(\log_{10} M_{\odot}\) within the range \([13, 15]\).
    • The additional 100 halos are sampled uniformly in \(\log_{10} M_{\odot}\) within the range \([10, 13]\).

To demonstrate the matching process, we generate:

  • 100 galaxy clusters, each with assigned positions, redshifts, and masses.
  • 200 halos, with two different placement strategies:
    1. First 100 halos are placed within 5 Mpc of a cluster’s position (in each dimension), simulating a correlated distribution.
    2. Remaining 100 halos are distributed randomly to provide a broader test case.

This setup ensures that some halos are closely associated with clusters, while others are randomly placed to create a more comprehensive matching scenario.

# Constants
CLUSTER_LENGTH = 100
HALO_LENGTH = 200

RA_MIN, RA_MAX = -10.0, 10.0
DEC_MIN, DEC_MAX = -10.0, 10.0
Z_MIN, Z_MAX = 0.2, 0.5
LOGM_MIN, LOGM_MAX = 13.0, 15.0  # Mass in log10 solar masses
LOGM_ADD_HALO_MIN, LOGM_ADD_HALO_MAX = 10.0, 13.0

# Generate cluster positions, redshifts, and masses
cluster_ra = np.random.uniform(RA_MIN, RA_MAX, CLUSTER_LENGTH)
cluster_sin_dec = np.random.uniform(
    np.sin(np.radians(DEC_MIN)), np.sin(np.radians(DEC_MAX)), CLUSTER_LENGTH
)
cluster_dec = np.degrees(np.arcsin(cluster_sin_dec))
cluster_z = np.random.uniform(Z_MIN, Z_MAX, CLUSTER_LENGTH)
cluster_logm = np.random.uniform(LOGM_MIN, LOGM_MAX, CLUSTER_LENGTH)
# Let's compute the cluster radii, and the 3D positions
cluster_r = np.array(dist.comoving_array(cosmo, cluster_z)) * cosmo.RH_Mpc()
cluster_x1 = (
    cluster_r * np.cos(np.radians(cluster_dec)) * np.cos(np.radians(cluster_ra))
)
cluster_x2 = (
    cluster_r * np.cos(np.radians(cluster_dec)) * np.sin(np.radians(cluster_ra))
)
cluster_x3 = cluster_r * np.sin(np.radians(cluster_dec))

# Generate halo positions, redshifts, and masses
# Lets first sample a halo < 5.0 Mpc from the cluster in each dimension
D_DIM = 5.0

halo_x1 = cluster_x1 + np.random.uniform(-D_DIM, D_DIM, CLUSTER_LENGTH)
halo_x2 = cluster_x2 + np.random.uniform(-D_DIM, D_DIM, CLUSTER_LENGTH)
halo_x3 = cluster_x3 + np.random.uniform(-D_DIM, D_DIM, CLUSTER_LENGTH)
halo_ra = np.degrees(np.arctan2(halo_x2, halo_x1))
halo_dec = np.degrees(np.arcsin(halo_x3 / cluster_r))
halo_r = np.sqrt(halo_x1**2 + halo_x2**2 + halo_x3**2)
halo_z = [dist.inv_comoving(cosmo, r / cosmo.RH_Mpc()) for r in halo_r]

# Now for the halo masses we use the cluster's masses added a Gaussian noise
halo_logm = cluster_logm + np.random.normal(0, 1.0, CLUSTER_LENGTH)

# Finally we add 100 more halos randomly
DELTA_OBJECTS = HALO_LENGTH - CLUSTER_LENGTH
halo_ra = np.append(halo_ra, np.random.uniform(RA_MIN, RA_MAX, DELTA_OBJECTS))
halo_dec = np.append(halo_dec, np.random.uniform(DEC_MIN, DEC_MAX, DELTA_OBJECTS))
halo_z = np.append(halo_z, np.random.uniform(Z_MIN, Z_MAX, DELTA_OBJECTS))
halo_logm = np.append(
    halo_logm, np.random.uniform(LOGM_ADD_HALO_MIN, LOGM_ADD_HALO_MAX, DELTA_OBJECTS)
)

Visualizing Cluster and Halo Distributions

We plot the generated cluster and halo positions, with the dot sizes proportional to the object’s \(\log_{10} M_{\odot}\).

Code
# Scale marker sizes with log mass
def scale_marker_size(log_mass, base_size=20, scale_factor=30):
    arr = scale_factor * (log_mass - LOGM_MIN)
    return base_size + np.where(arr < 0, 0, arr)


cluster_sizes = scale_marker_size(cluster_logm)
halo_sizes = scale_marker_size(halo_logm)

fig, ax = plt.subplots(figsize=(10, 8))

# Scatter plot for clusters (fixed red color)
ax.scatter(
    cluster_ra,
    cluster_dec,
    c=cluster_z,
    cmap="Reds",
    s=cluster_sizes,
    label="Clusters",
    alpha=0.7,
    vmin=0.0,
)

# Scatter plot for halos (color varies with z, from light to dark blue)
halo_scatter = ax.scatter(
    halo_ra,
    halo_dec,
    c=halo_z,
    cmap="Blues",
    s=halo_sizes,
    label="Halos",
    alpha=0.7,
    vmin=0.0,
)

# Colorbar to indicate redshift values
color_bar = plt.colorbar(halo_scatter, ax=ax, label="Redshift (z)")

ax.set_xlabel("Right Ascension (RA) [°]")
ax.set_ylabel("Declination (Dec) [°]")
ax.set_xlim(RA_MIN, RA_MAX)
ax.set_ylim(DEC_MIN, DEC_MAX)
ax.legend()
plt.show()
Figure 1: Cluster (red) and halo (blue) positions in the sky with sizes proportional to log mass. The redshift ranges from 0.2 to 0.5 and is indicated on the color intensity scale.

Creating the Astropy Table

The SkyMatch class requires two astropy.table.Table objects, one for the query catalog and one for the match catalog. These tables typically do not use the same column names expected by SkyMatch, so a sky_match.Coordinates object is needed to map the table column names to their respective properties.

# Create the astropy table for clusters
clusters = Table(
    [cluster_ra, cluster_dec, cluster_z, cluster_logm],
    names=["cluster_RA", "cluster_DEC", "cluster_z", "cluster_logM"],
)

# Create the astropy table for halos
halos = Table(
    [halo_ra, halo_dec, halo_z, halo_logm],
    names=["halo_RA", "halo_DEC", "halo_z", "halo_logM"],
)

# Create the Coordinates object for clusters
cluster_coords = Coordinates(RA="cluster_RA", DEC="cluster_DEC", z="cluster_z")

# Create the Coordinates object for halos
halo_coords = Coordinates(RA="halo_RA", DEC="halo_DEC", z="halo_z")

The Coordinates objects map the column names in the tables to their respective properties (RA, DEC, and z for both clusters and halos).

Viewing the Tables:

You can view the generated clusters and halos by converting them to Pandas DataFrames:

Code
def show_pandas(df: pd.DataFrame):
    return HTML(df.to_html(float_format="%.2f", max_rows=10))


show_pandas(clusters.to_pandas())
cluster_RA cluster_DEC cluster_z cluster_logM
0 -5.65 -0.33 0.43 13.83
1 6.51 -8.81 0.44 13.20
2 7.07 9.85 0.25 13.55
3 4.02 6.92 0.45 13.32
4 8.27 -7.72 0.36 13.32
... ... ... ... ...
95 7.20 -3.60 0.34 14.49
96 5.83 -4.58 0.34 14.68
97 9.50 -8.67 0.36 13.13
98 8.32 -0.42 0.40 14.17
99 -4.51 9.00 0.26 14.53
Table 1: The generated clusters and halos in the sky.

The generated clusters DataFrame:

Code
show_pandas(halos.to_pandas())
halo_RA halo_DEC halo_z halo_logM
0 -5.74 -0.34 0.43 13.32
1 6.54 -8.89 0.44 12.52
2 6.95 9.90 0.25 13.46
3 4.09 7.03 0.45 14.19
4 8.33 -7.90 0.37 14.40
... ... ... ... ...
195 5.26 -4.95 0.47 12.31
196 -6.97 -6.53 0.29 10.08
197 9.19 5.17 0.35 12.28
198 3.62 1.21 0.29 10.13
199 8.42 -5.69 0.22 10.91
Table 2: The generated clusters in the sky.

Sky Match Process

The matching process involves iterating over the query object and searching for corresponding entries in the match object. We are going to use the clusters and halos objects as the query and match catalogs, respectively.

# Create the SkyMatch object
sm = SkyMatch(clusters, cluster_coords, halos, halo_coords)

3D Matching

When both catalogs contain spectral redshift information, one can use the match_3d method to perform a 3D matching. Note that the match_3d method is cosmologically dependent.

# Perform 3D matching
# We are keeping the number of nearest neighbors to 6
result = sm.match_3d(cosmo, n_nearest_neighbours=6)

The result is a SkyMatchResult object, which holds the matched indices, distances, and nearest neighbours indices. We can view the result by converting it to a table:

Code
def convert_table_multi_column(table):
    table["ID_matched"] = [np.array2string(i) for i in table["ID_matched"]]
    table["RA_matched"] = [
        np.array2string(ra, precision=2) for ra in table["RA_matched"]
    ]
    table["DEC_matched"] = [
        np.array2string(dec, precision=2) for dec in table["DEC_matched"]
    ]
    if "distances" in table.columns:
        table["distances"] = [
            np.array2string(dist, precision=2) for dist in table["distances"]
        ]
    table["z_matched"] = [np.array2string(z, precision=2) for z in table["z_matched"]]

    return table


show_pandas(convert_table_multi_column(result.to_table_complete()).to_pandas())
ID RA DEC z ID_matched distances RA_matched DEC_matched z_matched
0 0 -5.65 -0.33 0.43 [  0 114  36  79 131  92] [ 2.08 40.88 67.68 68.16 77.03 81.73] [-5.74 -6.59 -5.59 -7.9  -6.73 -7.31] [-0.34 -1.01 -3.33 -2.76 -1.2  -2.55] [0.43 0.41 0.45 0.44 0.39 0.39]
1 1 6.51 -8.81 0.44 [  1 117  41 118  61 151] [ 1.9  40.79 51.49 51.58 67.76 68.2 ] [6.54 5.69 8.95 6.23 4.37 8.03] [-8.89 -7.89 -8.65 -9.37 -7.85 -6.22] [0.44 0.42 0.43 0.41 0.47 0.42]
2 2 7.07 9.85 0.25 [  2   8 134 132  74 147] [  3.16  50.88  82.02  99.73 113.07 114.74] [6.95 7.24 9.73 3.42 0.84 6.24] [9.9  6.28 9.55 8.7  9.75 6.7 ] [0.25 0.24 0.22 0.21 0.28 0.29]
3 3 4.02 6.92 0.45 [  3 162  35 127 119  24] [ 2.76 34.3  54.73 59.46 68.58 70.5 ] [4.09 2.68 3.43 6.75 3.95 6.74] [7.03 6.1  4.41 7.3  5.09 7.48] [0.45 0.45 0.44 0.44 0.42 0.48]
4 4 8.27 -7.72 0.36 [  4  97 121  78 133 186] [ 4.42 28.05 49.32 50.06 56.08 63.79] [8.33 9.53 7.92 7.17 6.64 5.34] [-7.9  -8.6  -6.36 -6.75 -7.96 -7.69] [0.37 0.36 0.39 0.39 0.39 0.34]
... ... ... ... ... ... ... ... ... ...
95 95 7.20 -3.60 0.34 [ 95  96 156 128  56 108] [ 1.19 31.23 40.06 50.12 50.5  58.09] [7.13 5.88 5.54 4.42 8.28 4.89] [-3.61 -4.75 -4.59 -2.88 -5.78 -5.82] [0.34 0.34 0.35 0.34 0.33 0.33]
96 96 5.83 -4.58 0.34 [ 96 156  95 108 128 186] [ 4.23 18.07 28.64 34.38 39.19 55.4 ] [5.88 5.54 7.13 4.89 4.42 5.34] [-4.75 -4.59 -3.61 -5.82 -2.88 -7.69] [0.34 0.35 0.34 0.33 0.34 0.34]
97 97 9.50 -8.67 0.36 [ 97   4 121  78 133 186] [ 2.15 26.81 70.44 73.53 75.93 81.74] [9.53 8.33 7.92 7.17 6.64 5.34] [-8.6  -7.9  -6.36 -6.75 -7.96 -7.69] [0.36 0.37 0.39 0.39 0.39 0.34]
98 98 8.32 -0.42 0.40 [ 98  53 146 125 101 178] [ 2.56 59.58 75.74 80.03 91.   97.93] [8.22 7.97 6.09 6.32 6.31 5.81] [-0.37  2.18 -0.85 -3.88 -1.69  1.42] [0.4  0.42 0.37 0.41 0.45 0.36]
99 99 -4.51 9.00 0.26 [ 99 103  34  15 115  71] [ 3.04 23.76 50.96 56.61 64.76 71.03] [-4.51 -3.03 -2.8  -6.73 -8.48 -5.  ] [9.13 8.76 7.14 8.78 7.55 9.58] [0.26 0.26 0.24 0.24 0.25 0.23]
Table 3: The result of the 3D matching.

Filtering Results

We can filter the results based on certain criteria, such as distance threshold or redshift threshold. For example, we can filter the results to keep only those with a distance less than 60 Mpc:

# Filter the results to keep only those with a distance less than 60 Mpc
mask = result.filter_mask_by_distance(60.0)

The resulting table can then be converted to a Pandas DataFrame:

Code
show_pandas(convert_table_multi_column(result.to_table_complete(mask=mask)).to_pandas())
ID RA DEC z ID_matched distances RA_matched DEC_matched z_matched
0 0 -5.65 -0.33 0.43 [  0 114] [ 2.08 40.88] [-5.74 -6.59] [-0.34 -1.01] [0.43 0.41]
1 1 6.51 -8.81 0.44 [  1 117  41 118] [ 1.9  40.79 51.49 51.58] [6.54 5.69 8.95 6.23] [-8.89 -7.89 -8.65 -9.37] [0.44 0.42 0.43 0.41]
2 2 7.07 9.85 0.25 [2 8] [ 3.16 50.88] [6.95 7.24] [9.9  6.28] [0.25 0.24]
3 3 4.02 6.92 0.45 [  3 162  35 127] [ 2.76 34.3  54.73 59.46] [4.09 2.68 3.43 6.75] [7.03 6.1  4.41 7.3 ] [0.45 0.45 0.44 0.44]
4 4 8.27 -7.72 0.36 [  4  97 121  78 133] [ 4.42 28.05 49.32 50.06 56.08] [8.33 9.53 7.92 7.17 6.64] [-7.9  -8.6  -6.36 -6.75 -7.96] [0.37 0.36 0.39 0.39 0.39]
... ... ... ... ... ... ... ... ... ...
95 95 7.20 -3.60 0.34 [ 95  96 156 128  56 108] [ 1.19 31.23 40.06 50.12 50.5  58.09] [7.13 5.88 5.54 4.42 8.28 4.89] [-3.61 -4.75 -4.59 -2.88 -5.78 -5.82] [0.34 0.34 0.35 0.34 0.33 0.33]
96 96 5.83 -4.58 0.34 [ 96 156  95 108 128 186] [ 4.23 18.07 28.64 34.38 39.19 55.4 ] [5.88 5.54 7.13 4.89 4.42 5.34] [-4.75 -4.59 -3.61 -5.82 -2.88 -7.69] [0.34 0.35 0.34 0.33 0.34 0.34]
97 97 9.50 -8.67 0.36 [97  4] [ 2.15 26.81] [9.53 8.33] [-8.6 -7.9] [0.36 0.37]
98 98 8.32 -0.42 0.40 [98 53] [ 2.56 59.58] [8.22 7.97] [-0.37  2.18] [0.4  0.42]
99 99 -4.51 9.00 0.26 [ 99 103  34  15] [ 3.04 23.76 50.96 56.61] [-4.51 -3.03 -2.8  -6.73] [9.13 8.76 7.14 8.78] [0.26 0.26 0.24 0.24]
Table 4: The filtered result of the 3D matching.

We can also visualize the filtered results, using the same plot characteristics as in the previous section:

Code
# Plot the filtered results


cluster_sizes = scale_marker_size(cluster_logm)
halo_sizes = scale_marker_size(halo_logm)


def plot_mask(mask):
    fig, ax = plt.subplots(figsize=(10, 8))

    # Scatter plot for clusters (fixed red color)
    ax.scatter(
        cluster_ra,
        cluster_dec,
        c=cluster_z,
        cmap="Reds",
        s=cluster_sizes,
        label="Clusters",
        alpha=0.7,
        vmin=0.0,
    )

    # Scatter plot for halos (color varies with z, from light to dark blue)
    halo_scatter = ax.scatter(
        halo_ra,
        halo_dec,
        c=halo_z,
        cmap="Blues",
        s=halo_sizes,
        label="Halos",
        alpha=0.7,
        vmin=0.0,
    )

    # Now we iterate over the filtered results plotting a line connecting the cluster to all matched halos
    for i, (idx, m) in enumerate(zip(result.nearest_neighbours_indices, mask.array)):
        for halo_i in idx[m]:
            ax.plot(
                [cluster_ra[i], halo_ra[halo_i]],
                [cluster_dec[i], halo_dec[halo_i]],
                color="black",
                alpha=0.5,
            )

    ax.legend()

    # Colorbar to indicate redshift values
    color_bar = plt.colorbar(halo_scatter, ax=ax, label="Redshift (z)")

    ax.set_xlabel("RA")
    ax.set_ylabel("Dec")

    return fig


plot_mask(mask)
plt.show()
Figure 2: The 3D matched clusters

Best Match

We can also use the select_best method to find the best match for each query object. This method returns the index of the best match, the distance between the query object and the best match, and the nearest neighbour index. There are different selection_criteria that can be used to select the best match, such as distance, redshift proximity, or more massive. Here we apply the default selection criteria, which is distance.

# Perform best match

best = result.select_best(mask=mask)

The result is a BestCandidates object, which holds the indices of the best candidates. We can view the result by converting it to a table:

Code
show_pandas(convert_table_multi_column(result.to_table_best(best)).to_pandas())
ID RA DEC z ID_matched RA_matched DEC_matched z_matched
0 0 -5.65 -0.33 0.43 0 -5.74 -0.34 0.43
1 1 6.51 -8.81 0.44 1 6.54 -8.89 0.44
2 2 7.07 9.85 0.25 2 6.95 9.9 0.25
3 3 4.02 6.92 0.45 3 4.09 7.03 0.45
4 4 8.27 -7.72 0.36 4 8.33 -7.9 0.37
... ... ... ... ... ... ... ... ...
95 95 7.20 -3.60 0.34 95 7.13 -3.61 0.34
96 96 5.83 -4.58 0.34 96 5.88 -4.75 0.34
97 97 9.50 -8.67 0.36 97 9.53 -8.6 0.36
98 98 8.32 -0.42 0.40 98 8.22 -0.37 0.4
99 99 -4.51 9.00 0.26 99 -4.51 9.13 0.26
Table 5: The best matched clusters

We can also visualize the best matches, using the same plot characteristics as in the previous section:

Code
# Plot the best matches


def plot_best(best):
    fig, ax = plt.subplots(figsize=(10, 8))

    # Scatter plot for clusters (fixed red color)
    ax.scatter(
        cluster_ra,
        cluster_dec,
        c=cluster_z,
        cmap="Reds",
        s=cluster_sizes,
        label="Clusters",
        alpha=0.7,
        vmin=0.0,
    )

    # Scatter plot for halos (color varies with z, from light to dark blue)
    halo_scatter = ax.scatter(
        halo_ra,
        halo_dec,
        c=halo_z,
        cmap="Blues",
        s=halo_sizes,
        label="Halos",
        alpha=0.7,
        vmin=0.0,
    )

    # Now we iterate over the best matches plotting a line connecting the cluster to the best halos
    for cluster_i, halo_i in zip(best.query_indices, best.indices):
        ax.plot(
            [cluster_ra[cluster_i], halo_ra[halo_i]],
            [cluster_dec[cluster_i], halo_dec[halo_i]],
            color="black",
            alpha=0.5,
        )

    ax.legend()

    # Colorbar to indicate redshift values
    color_bar = plt.colorbar(halo_scatter, ax=ax, label="Redshift (z)")

    ax.set_xlabel("RA")
    ax.set_ylabel("Dec")
    return fig


plot_best(best)
plt.show()
Figure 3: The 3D best matched clusters

Cross Match

Notice that the select_best method only returns the best match for each query object. That means that the same halo can be matched to multiple clusters. To find unique matches, we need to perform a cross match. To do that we first invert the SkyMatch object and then perform the best match.

# Perform cross match

cross = sm.invert_query_match()

cross_result = cross.match_3d(cosmo, n_nearest_neighbours=6)
cross_mask = cross_result.filter_mask_by_distance(60.0)

cross_best = cross_result.select_best(mask=cross_mask)

cross_match = best.get_cross_match_indices(cross_best)

Finally, we plot the cross matches:

Code
# Plot the cross matches


def plot_cross(cross_match):
    fig, ax = plt.subplots(figsize=(10, 8))

    # Scatter plot for clusters (fixed red color)
    ax.scatter(
        cluster_ra,
        cluster_dec,
        c=cluster_z,
        cmap="Reds",
        s=cluster_sizes,
        label="Clusters",
        alpha=0.7,
        vmin=0.0,
    )

    # Scatter plot for halos (color varies with z, from light to dark blue)
    halo_scatter = ax.scatter(
        halo_ra,
        halo_dec,
        c=halo_z,
        cmap="Blues",
        s=halo_sizes,
        label="Halos",
        alpha=0.7,
        vmin=0.0,
    )

    # Now we iterate over the cross matches plotting a line connecting the cluster to the best halos
    for cluster_i, halo_i in cross_match.items():
        ax.plot(
            [cluster_ra[cluster_i], halo_ra[halo_i]],
            [cluster_dec[cluster_i], halo_dec[halo_i]],
            color="black",
            alpha=0.5,
        )

    ax.legend()

    # Colorbar to indicate redshift values
    color_bar = plt.colorbar(halo_scatter, ax=ax, label="Redshift (z)")

    ax.set_xlabel("RA")
    ax.set_ylabel("Dec")
    return fig


plot_cross(cross_match)
plt.show()
Figure 4: The 3D cross matched clusters

2D Sky Match

The SkyMatch2D class functions similarly to the SkyMatch class but performs a 2D match instead of a 3D one.
In practice, it matches clusters to halos within the same plane of the sky, which is particularly useful when redshift measurement errors are significant.

In a 2D match, the angular separation between clusters and halos is first computed and then converted to a physical distance. This conversion can be performed using different reference redshifts: the cluster redshift, the halo redshift, or the minimum/maximum of the two. The choice is controlled by the distance_method parameter, which accepts the following options:

  • DistanceMethod.ANGULAR_SEPARATION (computes only the angular separation)
  • DistanceMethod.QUERY_RADIUS (uses the cluster redshift)
  • DistanceMethod.MATCH_RADIUS (uses the halo redshift)
  • DistanceMethod.MIN_RADIUS (uses the minimum redshift)
  • DistanceMethod.MAX_RADIUS (uses the maximum redshift)

The 2D matching process is performed using the match_2d method. Below is an example of how it works:

# Perform 2D matching  
# Keeping the number of nearest neighbors to 6  
result = sm.match_2d(
    cosmo, n_nearest_neighbours=6, distance_method=DistanceMethod.QUERY_RADIUS
)

Here, result contains the distances between clusters and halos, computed using the cluster redshifts. The table below shows these distances in mega-parsecs (Mpc):

Code
show_pandas(convert_table_multi_column(result.to_table_complete()).to_pandas())
ID RA DEC z ID_matched distances RA_matched DEC_matched z_matched
0 0 -5.65 -0.33 0.43 [  0 114  44 131  65 170] [ 1.93 23.4  27.67 27.94 29.18 31.85] [-5.74 -6.59 -4.67 -6.73 -6.32 -4.21] [-0.34 -1.01  0.63 -1.2   0.95 -0.98] [0.43 0.41 0.31 0.39 0.34 0.48]
1 1 6.51 -8.81 0.44 [  1 118  10 133   9 117] [ 1.8  12.73 14.35 17.69 24.31 25.24] [6.54 6.23 6.71 6.64 5.63 5.69] [-8.89 -9.37 -9.48 -7.96 -8.   -7.89] [0.44 0.41 0.27 0.39 0.23 0.42]
2 2 7.07 9.85 0.25 [  2  18  24 127 134 147] [ 1.86 29.94 33.46 36.02 36.98 45.58] [6.95 6.25 6.74 6.75 9.73 6.24] [9.9  7.87 7.48 7.3  9.55 6.7 ] [0.25 0.5  0.48 0.44 0.22 0.29]
3 3 4.02 6.92 0.45 [  3 168 162 119 132  43] [ 2.75 30.82 32.45 38.02 38.99 39.98] [4.09 5.15 2.68 3.95 3.42 2.66] [7.03 5.96 6.1  5.09 8.7  5.56] [0.45 0.37 0.45 0.42 0.21 0.31]
4 4 8.27 -7.72 0.36 [  4  41 121  78  97 151] [ 3.43 20.87 25.67 26.7  27.79 27.82] [8.33 8.95 7.92 7.17 9.53 8.03] [-7.9  -8.65 -6.36 -6.75 -8.6  -6.22] [0.37 0.43 0.39 0.39 0.36 0.42]
... ... ... ... ... ... ... ... ... ...
95 95 7.20 -3.60 0.34 [ 95 125  86  96   7 156] [ 1.17 15.98 21.03 30.41 31.74 33.68] [7.13 6.32 8.39 5.88 8.83 5.54] [-3.61 -3.88 -3.79 -4.75 -4.4  -4.59] [0.34 0.41 0.49 0.34 0.25 0.35]
96 96 5.83 -4.58 0.34 [ 96 156 195 125 144 169] [ 3.13  5.14 11.89 14.94 22.19 22.48] [5.88 5.54 5.26 6.32 5.23 5.38] [-4.75 -4.59 -4.95 -3.88 -5.69 -5.78] [0.34 0.35 0.47 0.41 0.3  0.41]
97 97 9.50 -8.67 0.36 [ 97  41   4 121 151  10] [ 1.23  9.84 25.23 50.61 51.73 52.19] [9.53 8.95 8.33 7.92 8.03 6.71] [-8.6  -8.65 -7.9  -6.36 -6.22 -9.48] [0.36 0.43 0.37 0.39 0.42 0.27]
98 98 8.32 -0.42 0.40 [ 98 143 146 101  57  53] [ 2.26 30.23 44.09 46.1  50.76 50.94] [8.22 9.55 6.09 6.31 9.23 7.97] [-0.37  0.53 -0.85 -1.69  2.03  2.18] [0.4  0.31 0.37 0.45 0.32 0.42]
99 99 -4.51 9.00 0.26 [ 99  71 160 150 140 145] [ 1.8  10.88 12.67 13.61 15.13 16.91] [-4.51 -5.   -3.64 -3.73 -3.88 -3.57] [9.13 9.58 9.17 9.55 9.85 8.28] [0.26 0.23 0.43 0.33 0.34 0.23]
Table 6: The distances between clusters and halos computed using the cluster redshifts.

Since this method assumes both objects share the same redshift (equal to the cluster redshift), the computed distances are generally smaller than their 3D counterparts. In 3D matching, the full spatial separation is considered, whereas in 2D matching, the calculated distance serves as a lower bound rather than the true three-dimensional separation. See Table 3 for comparison.

If we had used halo redshifts instead, we would still obtain a lower bound on the distance, but this time computed assuming the halos’ redshifts rather than the clusters’.

Filtering Results

To refine the matching process, we apply filters based on distance and redshift criteria.

Since the computed distances provide only a lower bound—meaning the actual 3D separations could be larger, we set a distance threshold of 60 Mpc to exclude pairs that are likely too far apart.

Additionally, we apply a redshift filter, retaining only pairs where the redshift difference is below 0.02, ensuring the matches remain within a physically relevant range.

# Apply a distance filter to retain only matches within 60 Mpc
mask = result.filter_mask_by_distance(60.0)

# Further refine by keeping only matches with redshift < 0.02
mask = result.filter_mask_by_redshift_proximity(0.02, mask=mask)

These filters help isolate the most relevant cluster-halo associations for further analysis.

The filtered table is shown below:

Code
show_pandas(convert_table_multi_column(result.to_table_complete(mask)).to_pandas())
ID RA DEC z ID_matched distances RA_matched DEC_matched z_matched
0 0 -5.65 -0.33 0.43 [  0 114 131 170] [ 1.93 23.4  27.94 31.85] [-5.74 -6.59 -6.73 -4.21] [-0.34 -1.01 -1.2  -0.98] [0.43 0.41 0.39 0.48]
1 1 6.51 -8.81 0.44 [  1 118 133 117] [ 1.8  12.73 17.69 25.24] [6.54 6.23 6.64 5.69] [-8.89 -9.37 -7.96 -7.89] [0.44 0.41 0.39 0.42]
2 2 7.07 9.85 0.25 [  2 134 147] [ 1.86 36.98 45.58] [6.95 9.73 6.24] [9.9  9.55 6.7 ] [0.25 0.22 0.29]
3 3 4.02 6.92 0.45 [  3 162 119] [ 2.75 32.45 38.02] [4.09 2.68 3.95] [7.03 6.1  5.09] [0.45 0.45 0.42]
4 4 8.27 -7.72 0.36 [  4 121  78  97 151] [ 3.43 25.67 26.7  27.79 27.82] [8.33 7.92 7.17 9.53 8.03] [-7.9  -6.36 -6.75 -8.6  -6.22] [0.37 0.39 0.39 0.36 0.42]
... ... ... ... ... ... ... ... ... ...
95 95 7.20 -3.60 0.34 [ 95  96 156] [ 1.17 30.41 33.68] [7.13 5.88 5.54] [-3.61 -4.75 -4.59] [0.34 0.34 0.35]
96 96 5.83 -4.58 0.34 [ 96 156 144] [ 3.13  5.14 22.19] [5.88 5.54 5.23] [-4.75 -4.59 -5.69] [0.34 0.35 0.3 ]
97 97 9.50 -8.67 0.36 [ 97   4 121] [ 1.23 25.23 50.61] [9.53 8.33 7.92] [-8.6  -7.9  -6.36] [0.36 0.37 0.39]
98 98 8.32 -0.42 0.40 [ 98 146 101  53] [ 2.26 44.09 46.1  50.94] [8.22 6.09 6.31 7.97] [-0.37 -0.85 -1.69  2.18] [0.4  0.37 0.45 0.42]
99 99 -4.51 9.00 0.26 [ 99  71 145] [ 1.8  10.88 16.91] [-4.51 -5.   -3.57] [9.13 9.58 8.28] [0.26 0.23 0.23]
Table 7: Filtered distances between clusters and halos computed using cluster redshifts.

The plot below illustrates the filtered results:

Code
plot_mask(mask)
plt.show()
Figure 5: The filtered distances between clusters and halos computed using the cluster redshifts.

The refined selection highlights cluster-halo pairs that are both spatially and redshift-wise well-matched within the relevant observational range.

Best Match Selection

To further refine the associations, we use the best_match method to identify the most suitable halo for each cluster.

We specify the selection_criteria as MORE_MASSIVE, meaning we select the halo with the largest halo_logM value as the best match.

# Find the best match for each cluster
best = result.select_best(
    selection_criteria=SelectionCriteria.MORE_MASSIVE,
    mask=mask,
    more_massive_column="halo_logM",
)

This results in a BestCandidates object containing the indices of the most likely halo match for each cluster. The table below presents the selected best matches, along with the corresponding cluster and halo masses:

Code
show_pandas(
    convert_table_multi_column(
        result.to_table_best(
            best,
            query_properties={"cluster_logM": "cluster_logM"},
            match_properties={"halo_logM": "halo_logM"},
        )
    ).to_pandas()
)
ID RA DEC z ID_matched RA_matched DEC_matched z_matched cluster_logM halo_logM
0 0 -5.65 -0.33 0.43 0 -5.74 -0.34 0.43 13.83 13.32
1 1 6.51 -8.81 0.44 118 6.23 -9.37 0.41 13.20 12.56
2 2 7.07 9.85 0.25 2 6.95 9.9 0.25 13.55 13.46
3 3 4.02 6.92 0.45 3 4.09 7.03 0.45 13.32 14.19
4 4 8.27 -7.72 0.36 78 7.17 -6.75 0.39 13.32 15.36
... ... ... ... ... ... ... ... ... ... ...
95 95 7.20 -3.60 0.34 96 5.88 -4.75 0.34 14.49 14.90
96 96 5.83 -4.58 0.34 96 5.88 -4.75 0.34 14.68 14.90
97 97 9.50 -8.67 0.36 4 8.33 -7.9 0.37 13.13 14.40
98 98 8.32 -0.42 0.40 98 8.22 -0.37 0.4 14.17 14.42
99 99 -4.51 9.00 0.26 71 -5. 9.58 0.23 14.53 13.48
Table 8: Best matches between clusters and halos computed using cluster redshifts.

The plot below visualizes the best matches:

Code
plot_best(best)
plt.show()
Figure 6: Best matches between clusters and halos computed using cluster redshifts.

This final selection ensures that each cluster is associated with the most massive halo within the given constraints, optimizing the matching process.

Cross Matching

To improve the accuracy of our cluster-halo associations, we use the cross_match method to identify the best match for each halo.
Previously, in Table 8, we selected the best match for each cluster. However, since we did not use actual distances, some incorrect matches may have been included.
By applying cross_match from the halo perspective, we refine the matching process.

# Find the best match for each halo

cross = sm.invert_query_match()

cross_result = cross.match_2d(cosmo, n_nearest_neighbours=6)

cross_mask = cross_result.filter_mask_by_distance(60.0)
cross_mask = cross_result.filter_mask_by_redshift_proximity(0.02, mask=cross_mask)

cross_best = cross_result.select_best(
    selection_criteria=SelectionCriteria.REDSHIFT_PROXIMITY, mask=cross_mask
)

cross_match = best.get_cross_match_indices(cross_best)

This results in a BestCandidates object containing the indices of the most probable cluster match for each halo.
The table below presents the selected best matches along with the corresponding cluster and halo masses:

Code
show_pandas(
    convert_table_multi_column(
        cross_result.to_table_best(
            cross_best,
            query_properties={"halo_logM": "halo_logM"},
            match_properties={"cluster_logM": "cluster_logM"},
        )
    ).to_pandas()
)
ID RA DEC z ID_matched RA_matched DEC_matched z_matched halo_logM cluster_logM
0 0 -5.74 -0.34 0.43 0 -5.65 -0.33 0.43 13.32 13.83
1 1 6.54 -8.89 0.44 1 6.51 -8.81 0.44 12.52 13.20
2 2 6.95 9.90 0.25 2 7.07 9.85 0.25 13.46 13.55
3 3 4.09 7.03 0.45 3 4.02 6.92 0.45 14.19 13.32
4 4 8.33 -7.90 0.37 4 8.27 -7.72 0.36 14.40 13.32
... ... ... ... ... ... ... ... ... ... ...
184 194 -9.49 7.31 0.34 67 -8.75 7.69 0.33 12.92 13.70
185 196 -6.97 -6.53 0.29 19 -7.42 -7.53 0.3 10.08 13.85
186 197 9.19 5.17 0.35 57 9.15 1.98 0.32 12.28 13.95
187 198 3.62 1.21 0.29 80 2.87 2.41 0.28 10.13 13.65
188 199 8.42 -5.69 0.22 7 8.69 -4.16 0.25 10.91 14.92
Table 9: Cross matches between clusters and halos computed using cluster redshifts.

The following plot visualizes the cross matches:

Code
plot_cross(cross_match)
plt.show()
Figure 7: Cross matches between clusters and halos computed using cluster redshifts.

In the final step, we assess the best match for each cluster by inspecting the cross_match dictionary, which stores the indices of the best matches.
We then evaluate the total number of matches, as well as the proportion of correct and incorrect assignments.

Code
# Convert dictionary to DataFrame for better visualization
cross_match_df = pd.DataFrame(
    list(cross_match.items()), columns=["Cluster ID", "Matched Halo ID"]
)

# Compute correctness
wrong = cross_match_df["Cluster ID"] != cross_match_df["Matched Halo ID"]
right = ~wrong  # Logical NOT for correct matches

# Add correctness column
cross_match_df["Correct Match"] = right

cross_match_df
Cluster ID Matched Halo ID Correct Match
0 0 0 True
1 1 118 False
2 2 2 True
3 3 3 True
4 5 5 True
... ... ... ...
63 90 90 True
64 92 92 True
65 94 94 True
66 96 96 True
67 98 98 True

68 rows × 3 columns

Table 10: Match statistics between clusters and halos computed using cluster redshifts.

The match statistics are summarized below:

Code
# Display match statistics
HTML(
    pd.DataFrame(
        {
            "Total Matches": [len(cross_match)],
            "Correct Matches": [right.sum()],
            "Wrong Matches": [wrong.sum()],
        }
    ).to_html(index=False)
)
Total Matches Correct Matches Wrong Matches
68 66 2
Table 11: Match statistics between clusters and halos computed using cluster redshifts.

In summary, we have successfully matched the clusters with the halos, keeping in mind that the artificial dataset was intentionally generated with increased errors to enhance visibility. As a result, some incorrect or suboptimal matches were expected.

Summary

This tutorial demonstrated NumCosmo’s SkyMatch class as a robust tool for cross-matching galaxy clusters with simulated dark matter halos. Using mock datasets, 100 clusters and 200 halos (100 spatially correlated with clusters, 100 randomly distributed), we showcased two key methodologies:
- match_3d: Achieves precise associations using 3D positional and redshift data, ideal for spectroscopic surveys.
- match_2d: Relies on angular proximity, critical for photometric surveys or datasets with redshift uncertainties.

Cross-validation via cross_match and statistical analysis revealed that correlated halos (positioned near clusters) were reliably matched, while mismatches arose primarily from random halos or positional noise. This underscores the importance of probabilistic matching (match_2d) and systematic validation in real-world scenarios, where overlapping structures and observational errors are common.

By quantifying completeness (successful cluster-halo matches) and purity (minimizing false associations), the SkyMatch framework enables rigorous evaluation of observational catalogs. While simplified mock data omitted complexities like mass-observable scatter or projection effects, the workflow mirrors steps essential for cosmological analyses, such as calibrating selection functions for cluster abundance studies.

Tools like SkyMatch bridge simulations and observations, allowing to diagnose catalog reliability and refine strategies for upcoming surveys (e.g., LSST, Euclid), where accurate matching will be crucial for constraining dark energy and structure growth.