Running stellar spectrum retrievals with exotune

Now that you have installed stctm and set up your environment, you are ready to run a retrieval on stellar spectra!

You can fit any stellar spectrum (loaded into the code as a StellSpec object) assuming it can be represented by a linear combination of up to three components: the photosphere, cooler regions (spots), and hotter regions (faculae). The code allows you to fit the contributions of spots and/or faculae, vary or fix the temperatures of the heterogeneity components, fit the surface gravity used for photosphere and/or heterogeneity models, and control priors on all fitted parameters. Gaussian and uniform priors are available, including log-uniform priors for heterogeneity covering fractions. Posterior samples, model spectra, diagnostic statistics, and publication-ready plots are generated.

The analysis also supports parallel processing, allowing you to run the retrieval efficiently on multiple cores or large clusters, which is especially useful for high-resolution or large-coverage spectral datasets.

The data

The example dataset provided under example/observations/ for exotune is a spectrum of TRAPPIST-1 (out-of-transit) observed with JWST NIRSpec/PRISM (Piaulet-Ghorayeb et al., 2025). The units are microns for wavelength and \(10^{-10}\) erg/cm²/\(\mu m\) for flux.

Your data file will need to be readable using one of the supported formats in stctm/pystellspec.py (see the StellSpec class definition below). The formats implemented so far are basic or MR_csv— but you can add a new handler if necessary by extending the StellSpec constructor.

For reference, here is how StellSpec objects are initialized from a table in stctm/pystellspec.py:

elif inputtype=='MR_csv':
    inputtable["yval"] = inputtable["spec"]*1e10
    inputtable["waveMin"] = inputtable["wave_low"]
    inputtable["waveMax"] = inputtable["wave_high"]
    inputtable["yerrLow"] = inputtable["err"]*1e10
    inputtable["yerrUpp"] = inputtable["err"]*1e10
    inputtable["wave"] = 0.5*(inputtable["waveMin"] + inputtable["waveMax"])
    super(StellSpec,self).__init__(inputtable)
    for c in self.colnames:
        setattr(self, c, self[c])

To use a custom format, add a new elif inputtype=='myformat': block, and transform your table columns accordingly, making sure that the yval (flux value in \(10^{-10}\) erg/cm²/\(\mu m\)), yerrLow and yerrUpp (lower and upper flux errors) are specified, as well as each bin’s central wavelength (wave in microns) and the minimum/maximum (wavelength edges) for each bin: waveMin and waveMax.
You may read your data using any compatible astropy table format (see the Astropy Table I/O documentation).
Ensure your wavelength array, waveMin, and waveMax are in microns (convert if necessary).

The recommended approach is to provide your spectrum file in one of these formats and then instantiate your object by specifying the format (via the spec_format parameter in your .toml file, see below).

In summary, as long as your input file fits one of these supported formats or you extend StellSpec for your format, you will be able to analyze your observed stellar spectrum with exotune.

Setting up an exotune retrieval: Run instructions

The .toml file structure largely mirrors that of TLS retrievals (see Running your first TLS retrieval for folder layout). The key exotune-specific modifications are:

The analysis folder is exotune_analysis/any_analysis_folder_name/. Create a new folder for each project under exotune_analysis/, e.g., template_exotune_analysis/.
The default TOML file is template_ini_exotune.toml which you can use as a starting point to build your own.
Results are saved in exotune_results/, a sibling folder to exotune_analysis/. Each run produces a subfolder in exotune_results/.

Ensure all environment variables and paths are correctly set (refer to comments at the top of the run script for requirements), copied here:

# make sure your environment variables are set up (see example below)
# os.environ['CRDS_SERVER_URL'] = "https://jwst-crds.stsci.edu"
# os.environ['CRDS_PATH'] = "/home/caroline/crds_cache"
# os.environ['PYSYN_CDBS'] = "/home/caroline/trds"

Example command to run exotune using the CLI helper, after navigating to your analysis folder:

stctm_exotune template_ini_exotune.toml

Inputs can be edited in the TOML file or passed as command-line arguments. For instance, to switch off fitspot:

stctm_exotune template_ini_exotune.toml -fitspot=0

Modifying the TOML file to make it your own

The sections and parameters below correspond directly to the exotune .toml file.

Choosing inputs and starting format

Contrary to stctm, for exotune you can either start from a pre-processed existing flux-calibrated stellar spectrum or from a time series of spectra. If the latter is the starting point, then exotune will proceed to some pre-processing steps to compute the “master” out-of-transit stellar spectrum given the pre-processing options detailed in the subsection below.

Under [choice_of_inputs]:

[choice_of_inputs]

label_obs = "test_visit"
start_from_timeseries = false
save_median_spectrum = false
path_save_median_spectrum = "../../observations/planetname/planet_outoftransit_spectrum.csv"
path_to_stellar_spec_ts = ""

path_to_spec = "../../observations/TRAPPIST_1_NIRSpec/exotune_templatespectrum.csv"
spec_format = "basic"

stmodfile = "../../R10000_model_grids/TRAPPIST_1_pymsg.h5"

label_obs: Short label for this dataset for tracking results and outputs.
start_from_timeseries: true if starting from a time series of spectra, false for a pre-processed single spectrum file (see below for pre-processing).
save_median_spectrum: If starting from a timeseries, set true to save the computed median spectrum.
path_save_median_spectrum: Output path for the median spectrum CSV (used only if save_median_spectrum is true).
path_to_stellar_spec_ts: Path to the time series file (used if start_from_timeseries is true).
path_to_spec: Path to single spectrum file (used if start_from_timeseries is false).
spec_format: Spectrum format string for loading into StellSpec. See pystellspec.py for supported formats or to add a custom format.
stmodfile: Path to the stellar models grid file (HDF5).

Preprocessing options

Under [preprocessing]:

[preprocessing]

optimize_param = false
obsmaskpattern= "nomask"
kern_size = 19

jd_range_mask = []
wave_range_mask = []

optimize_param: True to only preprocess and visualize (no MCMC, just diagnostic plots).
obsmaskpattern: Label used for the specific mask pattern (will be used as a string when saving the run).
kern_size: Kernel size for median filtering the plotted light curve (for visualization only).
jd_range_mask: Custom time-domain mask. To make sure that some intervals of time are ignored, e.g. in-transit, or during a stellar flare, enter their time stamps as [[start1,end1],[start2,end2],...].
wave_range_mask: Custom wavelength-domain mask, same format as above.

Saving options

Under [saving_options]:

[saving_options]
save_fit = true
res_suffix = "test_for_GitHub"

save_fit: True to save results in the output directory after completion.
res_suffix: Suffix tagging the output files for identification; change for each new run.

Stellar parameters

Under [stellar_params]:

[stellar_params]
Teffstar = 2566.0
feh = 0.040
loggstar = 5.2396

logg_phot_source = "value"
logg_phot_value = 2.5

Teffstar: Effective temperature of the star in Kelvin.
feh: Metallicity [Fe/H] in dex.
loggstar: Surface gravity log(g) in cgs.
logg_phot_source: value to use logg_phot_value for the photosphere log(g) default, loggstar to use the star value.

Reading in the grid of stellar models

Under [stellar models]:

[stellar_models]

label_grid = "PHOENIX_TRAPPIST_1"
logg_range = [2.5,5.5]
loggstep = 0.1
Teff_range = "default"
Teffstep = 20.0
resPower_target = 10000
wave_range = [0.2,5.4]

label_grid: Name/label of the stellar model grid (used as a string to save the run).

At this stage, refer to your create_fixedR_grid_pymsg_template.py file (or the equivalent file you used to create your grid of stellar models). In that file, you will find the setup of the grid in a block such as:

# range of params for the grid

logg_range = [2.5,5.5]
Teff_range = [np.max([param["Tphot"]-1000, 2300.]), param["Tphot"]+1000.]
loggstep = 0.1 #cgs
Teffstep = 20. #K
resPower_target = 10000
wv_min_um = 0.2
wv_max_um = 5.4

Returning to the .toml file: * logg_range: Range of log(g) covered in the grid(format [minlogg,maxlogg]). * loggstep: Grid step in log(g). * Teff_range: Temperature range; default uses values calculated from Teffstar: it assumes the default grid calculation setup, with`` min = np.max([Teffstar-1000, 2300.]) `` and max=Teffstar+1000. * Teffstep: Grid step in temperature. * resPower_target: Resolving power at which the grid was created. * wave_range: Wavelength range for fitting (microns, [min,max] format).

MCMC sampling parameters

Under [MCMC params]:

[MCMC_params]
parallel = true
ncpu = 30

nsteps=3000
frac_burnin = 0.6

fitspot = true
fitfac = true

fitThet = true
fitTphot = true

fitlogg_phot = true
fitlogg_het = true

fitFscale = true
fiterrInfl = true

parallel: Whether to use multiprocessing (true recommended).
ncpu: Number of CPUs for parallel MCMC run.
nsteps: Number of steps for each MCMC chain (5000+ recommended for analysis).
frac_burnin: Fraction of chain steps discarded as burn-in (e.g., 0.6).
fitspot / fitfac: Whether to fit spot/faculae covering fractions.
fitThet / fitTphot: Whether to fit spots/faculae/photosphere temperature.
fitlogg_phot / fitlogg_het: Whether to fit log(g) for photosphere and/or heterogeneity.
fitFscale: Fit a flux scaling factor to match observed/model spectra.
fiterrInfl: Fit an error inflation factor to relax the provided data error bars if model/data mismatch is large.

Priors on the fitted parameters

Under [priors]:

[priors]
gaussparanames = "Tphot"
hyperp_gausspriors = [2566,70]

fitLogfSpotFac = [0,0]
hyperp_logpriors = [-5,0]

gaussparanames: List of parameters to apply a Gaussian prior (e.g. ["Tphot","ffac").
hyperp_gausspriors: Mean and std for each Gaussian prior. For multiple parameters: e.g. [[mean1,std1],[mean2,std2]]
fitLogfSpotFac: Specifies if spot/faculae priors are uniform in linear (toggle 0) or log space (toggle 1).
hyperp_logpriors: Bounds for log-priors ([lowerBound,upperBound]).

Beyond the flexibility provided in the .toml file, you can look up the logic in get_param_priors() in stctm/exotune_utilities.py.

Plotting

Under [plotting]:

[plotting]
pad = 0.25
target_resP = 300

pad: Padding in microns to adjust spectra plot axis boundaries.
target_resP: Resolving power model spectra are downgraded to when plotted.

Post-processing

By default, exotune generates and saves the following files to a custom directory created under exotune_results/, starting with the prefix “fit”. If only preprocessing is run (i.e., optimize_param = True), or if the starting point is a time-series of spectra, the results from this preprocessing step will be in a folder starting with “preprocessOnly”.

Inputs and recordkeeping:

Copy of run script, TOML file, and exotune_utilities.py used
Figure of the fitted spectrum
defaultparams CSV file with fit initial values

Pre-processing steps:

select_time: Median-filtered light curve marking masked intervals used in generating the median spectrum
select_wave: Median spectrum before masking, with masked wavelength intervals shaded
get_fscale: Initial model/data comparison used for scavenging the starting value of Fscale

Outputs (CSV files):

pandas: Fitted parameters from chain, with log-likelihood and log-probability
bestfit: Best-fit value (maximum likelihood), max-probability, and percentiles for quoting
bestfit_stats: Model comparison statistics: best-fit model index, reduced chi-squared, and BIC
fixedR_1_2_3_sigma: Model spectra at the plotted resolving power (default target_resP) for max-likelihood, max-probability, and percentile intervals
blobs_1_2_3_sigma: Model spectra integrated in observed data bins for max-likelihood, max-probability, and percentiles

Calculated models:

NPY file containing “blobs”: the series of models from MCMC sampling

Diagnostics figures:

chainplot: Chain plots, before and after burn-in
bestfit_model: Plot of the best-fit model over data

Publication-ready figures:

resP..._1_2_3_sigma: Fitted spectra with 1/2/3 sigma intervals at high resolution (resolving power target_resP), log or lin wavelength axis
combo_resP..._1_2_3_sigma: Top: fitted spectrum and intervals; Bottom: marginalized posterior distributions for component parameters
1_2_3_sigma: Fitted spectrum with intervals using data bin integration
Corner plot of post-burnin samples

Please let me know (or create a pull request!) if there are additional outputs that would be useful defaults.