def doc_theme():
return theme_minimal() + theme(
panel_grid_minor=element_line(color="gray", linetype="--"),
)Inspecting Data Objects
Introduction
In this example, we show how to inspect observational data objects available in NumCosmo. This is useful before building a likelihood or adapting an existing fitting example to a different dataset.
A pure Python companion script is also provided with this example as inspect_data_objects.py.
We first inspect the NumCosmo source tree to identify the available data-object files. Then we inspect one selected data source file and finally create a concrete DataHubble object from Python to examine its runtime properties.
The source-tree inspection gives a broad map of available data types. The runtime inspection shows what a concrete object exposes through methods such as list_properties() and get_property().
Inspecting the Data Source Files
NumCosmo data objects are implemented in the source tree under:
numcosmo/data/
Files named nc_data_*.h and nc_data_*.c define the main data-object types used by NumCosmo. For example, files such as nc_data_hubble.h, nc_data_bao.h, nc_data_snia.h, and nc_data_cmb_dist_priors.h correspond to different families of observational data.
In this first step, we inspect the source tree and create a compact table of available data-type files.
Code
from pathlib import Path
import os
import re
import pandas as pd
from IPython.display import HTML, display
from numcosmo_py import Nc, Ncm
# Initialize the library.
Ncm.cfg_init()
def show_pandas(df: pd.DataFrame):
"""
Display a Pandas DataFrame as an HTML table.
"""
return HTML(df.to_html(index=False, max_rows=20))
def show_numeric_pandas(df: pd.DataFrame):
"""
Display a numeric Pandas DataFrame with compact floating-point formatting.
"""
return HTML(df.to_html(index=False, max_rows=20, float_format="%.4f"))
def find_numcosmo_data_dir():
"""
Try to find the NumCosmo source data directory.
The preferred method is to set the environment variable:
NUMCOSMO_SOURCE_DIR=/path/to/NumCosmo
If this variable is not set, we try to find the repository root
by walking upward from the current working directory.
"""
candidates = []
env_source_dir = os.environ.get("NUMCOSMO_SOURCE_DIR")
if env_source_dir is not None:
candidates.append(Path(env_source_dir).expanduser().resolve())
cwd = Path.cwd().resolve()
for parent in [cwd] + list(cwd.parents):
candidates.append(parent)
for candidate in candidates:
data_dir = candidate / "numcosmo" / "data"
if data_dir.is_dir() and any(data_dir.glob("nc_data*.h")):
return data_dir
return None
def read_text(path):
return path.read_text(encoding="utf-8", errors="replace")
def family_from_header_name(header_name):
"""
Infer a broad data family from the header filename.
Examples
--------
nc_data_bao.h -> bao
nc_data_bao_a.h -> bao
nc_data_hubble.h -> hubble
nc_data_hubble_bao.h -> hubble
nc_data_cluster_ncount.h -> cluster
nc_data_cmb_dist_priors.h -> cmb
nc_data_snia_cov.h -> snia
"""
stem = header_name.removesuffix(".h")
if not stem.startswith("nc_data_"):
return "other"
rest = stem.removeprefix("nc_data_")
return rest.split("_")[0]
def extract_concrete_data_type(header_text):
"""
Extract the public C/GObject data type from declarations such as:
G_DECLARE_FINAL_TYPE (NcDataHubble, ...)
G_DECLARE_DERIVABLE_TYPE (NcDataBaoA, ...)
Some base or factory-style files may not declare a concrete type.
In those cases, this function returns None.
"""
pattern = re.compile(
r"G_DECLARE_(?:FINAL|DERIVABLE)_TYPE\s*\(\s*(NcData[A-Za-z0-9_]*)\s*,",
re.MULTILINE,
)
match = pattern.search(header_text)
if match:
return match.group(1)
return None
def analyze_data_header(header_path):
"""
Analyze one nc_data_*.h file.
"""
source_path = header_path.with_suffix(".c")
header_text = read_text(header_path)
return {
"family": family_from_header_name(header_path.name),
"header": header_path.name,
"source": source_path.name if source_path.exists() else None,
"concrete_data_type": extract_concrete_data_type(header_text),
}
def data_type_overview_dataframe(data_dir):
"""
Return a compact overview of the NumCosmo data-type source files as a DataFrame.
"""
headers = sorted(data_dir.glob("nc_data*.h"))
items = [analyze_data_header(header) for header in headers]
items.sort(key=lambda item: (item["family"], item["header"]))
rows = []
for item in items:
rows.append(
{
"Family": item["family"],
"Header": item["header"],
"Source": item["source"] if item["source"] is not None else "-",
"Concrete data type": (
item["concrete_data_type"]
if item["concrete_data_type"] is not None
else "-"
),
}
)
return pd.DataFrame(rows)
data_dir = find_numcosmo_data_dir()The helper code above searches for the NumCosmo source directory and defines a compact summary of the available data-object files. The code is folded to keep the rendered page focused on the output.
if data_dir is None:
print("Could not find the NumCosmo source data directory.")
print()
print("If you are running this example from outside the NumCosmo repository,")
print("set NUMCOSMO_SOURCE_DIR to the NumCosmo source path.")
print()
print("For example:")
print(" export NUMCOSMO_SOURCE_DIR=$HOME/dev/NumCosmo")
else:
print("Using NumCosmo data directory:")
print(f" {data_dir}")Using NumCosmo data directory:
/home/runner/work/NumCosmo/NumCosmo/numcosmo/data
Code
if data_dir is not None:
data_type_overview = data_type_overview_dataframe(data_dir)
display(show_pandas(data_type_overview))| Family | Header | Source | Concrete data type |
|---|---|---|---|
| bao | nc_data_bao.h | nc_data_bao.c | - |
| bao | nc_data_bao_a.h | nc_data_bao_a.c | NcDataBaoA |
| bao | nc_data_bao_dhr_dar.h | nc_data_bao_dhr_dar.c | NcDataBaoDHrDAr |
| bao | nc_data_bao_dmr_hr.h | nc_data_bao_dmr_hr.c | NcDataBaoDMrHr |
| bao | nc_data_bao_dtr_dhr.h | nc_data_bao_dtr_dhr.c | NcDataBaoDtrDHr |
| bao | nc_data_bao_dv.h | nc_data_bao_dv.c | NcDataBaoDV |
| bao | nc_data_bao_dvdv.h | nc_data_bao_dvdv.c | NcDataBaoDVDV |
| bao | nc_data_bao_dvr_dtdh.h | nc_data_bao_dvr_dtdh.c | NcDataBaoDvrDtDh |
| bao | nc_data_bao_empirical_fit.h | nc_data_bao_empirical_fit.c | NcDataBaoEmpiricalFit |
| bao | nc_data_bao_empirical_fit_2d.h | nc_data_bao_empirical_fit_2d.c | NcDataBaoEmpiricalFit2d |
| ... | ... | ... | ... |
| cmb | nc_data_cmb.h | nc_data_cmb.c | - |
| cmb | nc_data_cmb_dist_priors.h | nc_data_cmb_dist_priors.c | - |
| cmb | nc_data_cmb_shift_param.h | nc_data_cmb_shift_param.c | - |
| dist | nc_data_dist_mu.h | nc_data_dist_mu.c | - |
| hubble | nc_data_hubble.h | nc_data_hubble.c | NcDataHubble |
| hubble | nc_data_hubble_bao.h | nc_data_hubble_bao.c | - |
| planck | nc_data_planck_lkl.h | nc_data_planck_lkl.c | NcDataPlanckLKL |
| snia | nc_data_snia.h | nc_data_snia.c | - |
| snia | nc_data_snia_cov.h | nc_data_snia_cov.c | NcDataSNIACov |
| xcor | nc_data_xcor.h | nc_data_xcor.c | NcDataXcor |
The table gives a first map of the available data-object families and their corresponding source files. For example, the hubble family contains the Hubble expansion-rate data object, while the bao family contains several BAO-related data formats.
A dash in the Concrete data type column does not mean that the file is incomplete or unused. It only means that this simple inspection code did not detect a concrete GObject data class declared directly in that header.
Some files, such as nc_data_bao.h, can still be important because they may define shared dataset identifiers, factory functions, or common infrastructure for a whole data family. More specific files, such as nc_data_bao_a.h or nc_data_bao_dv.h, may then define concrete data classes for particular formats.
Inspecting a Selected Data Source File
After listing the available data-type files, we can choose one of them for a more focused inspection. Here we choose:
nc_data_hubble.h
This file corresponds to the Hubble expansion-rate data object.
The next code block prints documentation-relevant information: the data family, the corresponding source file, the concrete data type when detected, dataset identifier enum types when present, the enum members when present, and GObject property names found in the source file.
It intentionally avoids lower-level C details such as public C functions or internal GObject declarations.
Code
def remove_c_comments(text):
"""
Remove C and C++-style comments before parsing enums.
"""
text = re.sub(r"/\*.*?\*/", "", text, flags=re.DOTALL)
text = re.sub(r"//.*?$", "", text, flags=re.MULTILINE)
return text
def extract_dataset_enums(header_text):
"""
Extract dataset identifier enums such as:
NcDataHubbleId
NcDataBaoId
NcDataSNIAId
Returns a list of dictionaries with enum names and members.
"""
text = remove_c_comments(header_text)
pattern = re.compile(
r"typedef\s+enum(?:\s+[A-Za-z0-9_]+)?\s*\{"
r"(?P<body>.*?)"
r"\}\s*(?P<name>NcData[A-Za-z0-9_]*Id)\s*;",
re.DOTALL,
)
enums = []
for match in pattern.finditer(text):
body = match.group("body")
enum_name = match.group("name")
members = []
for raw_item in body.split(","):
item = raw_item.strip()
if not item:
continue
item = item.split("=")[0].strip()
if item.startswith("NC_DATA_"):
members.append(item)
enums.append(
{
"name": enum_name,
"members": members,
}
)
return enums
def common_prefix_ending_in_underscore(names):
"""
Find a readable common prefix among enum members.
Example
-------
NC_DATA_HUBBLE_GOMEZ_VALENT_COMP2018
NC_DATA_HUBBLE_RIESS2018
gives:
NC_DATA_HUBBLE_
"""
if not names:
return ""
prefix = names[0]
for name in names[1:]:
while not name.startswith(prefix):
prefix = prefix[:-1]
if not prefix:
return ""
if "_" in prefix:
return prefix[: prefix.rfind("_") + 1]
return ""
def extract_gobject_properties(source_text):
"""
Extract property names from g_param_spec_* calls.
These names often correspond to properties accessible in Python through:
data.get_property("property-name")
"""
pattern = re.compile(
r"g_param_spec_[A-Za-z0-9_]+\s*\(\s*\"([^\"]+)\"",
re.MULTILINE,
)
return sorted(set(pattern.findall(source_text)))
def analyze_selected_data_header(data_dir, header_name):
"""
Analyze one selected nc_data_*.h file and its corresponding .c file.
"""
header_path = data_dir / header_name
if not header_path.exists():
raise FileNotFoundError(f"Header not found: {header_path}")
source_path = header_path.with_suffix(".c")
header_text = read_text(header_path)
source_text = read_text(source_path) if source_path.exists() else ""
return {
"family": family_from_header_name(header_path.name),
"header": header_path.name,
"source": source_path.name if source_path.exists() else None,
"concrete_data_type": extract_concrete_data_type(header_text),
"dataset_enums": extract_dataset_enums(header_text),
"properties": extract_gobject_properties(source_text),
}
def selected_header_summary_dataframe(item):
"""
Return a one-row summary table for a selected data header.
"""
return pd.DataFrame(
[
{
"Family": item["family"],
"Header": item["header"],
"Source": item["source"] if item["source"] is not None else "-",
"Concrete data type": (
item["concrete_data_type"]
if item["concrete_data_type"] is not None
else "-"
),
}
]
)
def dataset_enum_dataframe(item):
"""
Return a table with dataset identifier enum members.
"""
rows = []
for enum in item["dataset_enums"]:
prefix = common_prefix_ending_in_underscore(enum["members"])
for member in enum["members"]:
short_name = member.removeprefix(prefix) if prefix else member
rows.append(
{
"Enum type": enum["name"],
"Python member": short_name,
"C enum member": member,
}
)
return pd.DataFrame(rows)
def gobject_properties_dataframe(item):
"""
Return a table with GObject property names found in the source file.
"""
return pd.DataFrame(
[
{
"Property": prop,
}
for prop in item["properties"]
]
)Now we inspect the selected Hubble data source file.
selected_header = "nc_data_hubble.h"
if data_dir is None:
selected_item = None
print("Skipping source-file inspection because the NumCosmo source directory was not found.")
else:
selected_item = analyze_selected_data_header(data_dir, selected_header)Code
if selected_item is not None:
display(show_pandas(selected_header_summary_dataframe(selected_item)))| Family | Header | Source | Concrete data type |
|---|---|---|---|
| hubble | nc_data_hubble.h | nc_data_hubble.c | NcDataHubble |
Code
if selected_item is not None:
enum_df = dataset_enum_dataframe(selected_item)
if len(enum_df) > 0:
display(show_pandas(enum_df))
else:
print("No dataset identifier enum members were detected.")| Enum type | Python member | C enum member |
|---|---|---|
| NcDataHubbleId | SIMON2005 | NC_DATA_HUBBLE_SIMON2005 |
| NcDataHubbleId | CABRE | NC_DATA_HUBBLE_CABRE |
| NcDataHubbleId | STERN2009 | NC_DATA_HUBBLE_STERN2009 |
| NcDataHubbleId | MORESCO2012_BC03 | NC_DATA_HUBBLE_MORESCO2012_BC03 |
| NcDataHubbleId | MORESCO2012_MASTRO | NC_DATA_HUBBLE_MORESCO2012_MASTRO |
| NcDataHubbleId | MORESCO2015 | NC_DATA_HUBBLE_MORESCO2015 |
| NcDataHubbleId | MORESCO2016_DR9_BC03 | NC_DATA_HUBBLE_MORESCO2016_DR9_BC03 |
| NcDataHubbleId | MORESCO2016_DR9_MASTRO | NC_DATA_HUBBLE_MORESCO2016_DR9_MASTRO |
| NcDataHubbleId | BUSCA2013_BAO_WMAP | NC_DATA_HUBBLE_BUSCA2013_BAO_WMAP |
| NcDataHubbleId | RIESS2008_HST | NC_DATA_HUBBLE_RIESS2008_HST |
| NcDataHubbleId | ZHANG2012 | NC_DATA_HUBBLE_ZHANG2012 |
| NcDataHubbleId | RIESS2016_HST_WFC3 | NC_DATA_HUBBLE_RIESS2016_HST_WFC3 |
| NcDataHubbleId | RATSIMBAZAFY2017 | NC_DATA_HUBBLE_RATSIMBAZAFY2017 |
| NcDataHubbleId | GOMEZ_VALENT_COMP2018 | NC_DATA_HUBBLE_GOMEZ_VALENT_COMP2018 |
| NcDataHubbleId | RIESS2018 | NC_DATA_HUBBLE_RIESS2018 |
| NcDataHubbleId | BORGHI2022 | NC_DATA_HUBBLE_BORGHI2022 |
| NcDataHubbleId | JIAO2023 | NC_DATA_HUBBLE_JIAO2023 |
| NcDataHubbleId | JIMENEZ2023 | NC_DATA_HUBBLE_JIMENEZ2023 |
| NcDataHubbleId | TOMASETTI2023 | NC_DATA_HUBBLE_TOMASETTI2023 |
| NcDataHubbleId | NSAMPLES | NC_DATA_HUBBLE_NSAMPLES |
Code
if selected_item is not None:
prop_df = gobject_properties_dataframe(selected_item)
if len(prop_df) > 0:
display(show_pandas(prop_df))
else:
print("No GObject property names were detected.")| Property |
|---|
| z |
When the selected file does not show a concrete data type, this does not necessarily mean that the file is unused. It may define common infrastructure for a data family rather than a directly instantiable data object.
The most important item for a Python user is the dataset identifier enum. In this case, the enum tells us which built-in Hubble datasets can be passed to the constructor. For example, if GOMEZ_VALENT_COMP2018 appears among the identifier members, it corresponds to the Python enum member:
Nc.DataHubbleId.GOMEZ_VALENT_COMP2018Runtime Inspection
The source-tree inspection is useful to understand how data objects are organized in the NumCosmo source code. However, once a concrete data object has been created, the most direct way to inspect it from Python is to use its runtime properties.
We first list the available members of the DataHubbleId enum as exposed in Python.
Code
def list_python_enum_members(enum_cls):
"""
List enum-like members exposed through Python/GObject introspection.
"""
members = []
for name in sorted(dir(enum_cls)):
if name.isupper() and not name.startswith("_"):
members.append(name)
return members
hubble_id_members = pd.DataFrame(
[
{
"Python member": name,
}
for name in list_python_enum_members(Nc.DataHubbleId)
]
)
display(show_pandas(hubble_id_members))| Python member |
|---|
| BORGHI2022 |
| BUSCA2013_BAO_WMAP |
| CABRE |
| GOMEZ_VALENT_COMP2018 |
| JIAO2023 |
| JIMENEZ2023 |
| MORESCO2012_BC03 |
| MORESCO2012_MASTRO |
| MORESCO2015 |
| MORESCO2016_DR9_BC03 |
| MORESCO2016_DR9_MASTRO |
| RATSIMBAZAFY2017 |
| RIESS2008_HST |
| RIESS2016_HST_WFC3 |
| RIESS2018 |
| SIMON2005 |
| STERN2009 |
| TOMASETTI2023 |
| ZHANG2012 |
We now instantiate one concrete Hubble expansion-rate dataset.
data = Nc.DataHubble.new_from_id(Nc.DataHubbleId.GOMEZ_VALENT_COMP2018)The object data contains the observational data and metadata associated with the selected Hubble-rate compilation.
A first useful check is to inspect the Python type of the object.
print(type(data))<class 'gi.repository.NumCosmo.DataHubble'>
The method list_properties() lists the GObject properties exposed by the object. These are the properties that can usually be accessed from Python with get_property().
Code
runtime_properties = pd.DataFrame(
[
{
"Property": prop.name,
"Value type": prop.value_type.name,
}
for prop in data.list_properties()
]
)
display(show_pandas(runtime_properties))| Property | Value type |
|---|---|
| name | gchararray |
| desc | gchararray |
| long-desc | gchararray |
| init | gboolean |
| bootstrap | NcmBootstrap |
| n-points | guint |
| w-mean | gboolean |
| mean | NcmVector |
| sigma | NcmVector |
| z | NcmVector |
Many NumCosmo data objects provide a short textual description through the desc property.
print(data.get_property("desc"))Gomez-Valent 2018 -- arXiv:1802.01505
The number of data points is available through the n-points property.
n_points = data.get_property("n-points")
print(f"Number of points: {n_points}")Number of points: 31
Accessing Numerical Arrays
For this Hubble-rate dataset, the redshifts, measured values, and uncertainties are exposed through the properties z, mean, and sigma.
z = data.get_property("z")
mean = data.get_property("mean")
sigma = data.get_property("sigma")These properties are NumCosmo vector objects. Individual entries can be accessed with .get(i).
print("First redshift:", z.get(0))
print("First H(z) value:", mean.get(0))
print("First uncertainty:", sigma.get(0))First redshift: 0.07
First H(z) value: 69.0
First uncertainty: 19.6
We can display a small table with the first few data points.
Code
hubble_preview = pd.DataFrame(
[
{
"i": i,
"z": z.get(i),
"H(z)": mean.get(i),
"sigma": sigma.get(i),
}
for i in range(min(5, n_points))
]
)
display(show_numeric_pandas(hubble_preview))| i | z | H(z) | sigma |
|---|---|---|---|
| 0 | 0.0700 | 69.0000 | 19.6000 |
| 1 | 0.0900 | 69.0000 | 12.0000 |
| 2 | 0.1200 | 68.6000 | 26.2000 |
| 3 | 0.1700 | 83.0000 | 8.0000 |
| 4 | 0.1791 | 75.0000 | 4.0000 |
This gives a quick preview of the dataset before using it in a fit.
Helper Function
When exploring data objects, it is useful to define a small helper function.
Code
def inspect_data_object(data):
"""
Display basic runtime information about a NumCosmo data object.
"""
print("=" * 80)
print("Runtime data object inspection")
print("=" * 80)
print()
print("Object type:")
print(type(data))
print()
property_names = [prop.name for prop in data.list_properties()]
if "desc" in property_names:
print("Description:")
print(data.get_property("desc"))
print()
if "n-points" in property_names:
print("Number of points:")
print(data.get_property("n-points"))
print()
runtime_properties = pd.DataFrame(
[
{
"Property": prop.name,
"Value type": prop.value_type.name,
}
for prop in data.list_properties()
]
)
display(show_pandas(runtime_properties))We can then call the helper on the Hubble data object.
inspect_data_object(data)================================================================================
Runtime data object inspection
================================================================================
Object type:
<class 'gi.repository.NumCosmo.DataHubble'>
Description:
Gomez-Valent 2018 -- arXiv:1802.01505
Number of points:
31
| Property | Value type |
|---|---|
| name | gchararray |
| desc | gchararray |
| long-desc | gchararray |
| init | gboolean |
| bootstrap | NcmBootstrap |
| n-points | guint |
| w-mean | gboolean |
| mean | NcmVector |
| sigma | NcmVector |
| z | NcmVector |
This helper does not replace the documentation for each data class, but it provides a useful first overview.
General Strategy
When working with NumCosmo data objects, a useful workflow is:
- Look at the available data-type source files under
numcosmo/data/. - Choose the data family relevant to the example, such as Hubble, BAO, CMB, or supernova data.
- Inspect the corresponding source file to find the dataset identifier enum, if available.
- Instantiate one concrete data object from Python.
- Use
list_properties()to see what the object exposes at runtime. - Use
get_property()to access metadata and numerical arrays. - Print a small preview before using the data in a fit.
For example, the Hubble case follows this pattern.
data = Nc.DataHubble.new_from_id(Nc.DataHubbleId.GOMEZ_VALENT_COMP2018)
print(type(data))
print(data.get_property("desc"))
print(data.get_property("n-points"))
for prop in data.list_properties():
print(prop.name)<class 'gi.repository.NumCosmo.DataHubble'>
Gomez-Valent 2018 -- arXiv:1802.01505
31
name
desc
long-desc
init
bootstrap
n-points
w-mean
mean
sigma
z
In fitting examples, it is usually better to keep the main text focused on the cosmological model, likelihood, sampler, and results. Instead of explaining the internal structure of the data object in every fitting example, those examples can briefly mention the dataset being used and point to this page for details.