Additional Tools#
On the analysis system, a dedicated conda environment is avaiable with the following packages.
To activate the environment, open a terminal, activate the base enviroment, then activate scd-reduction-tools.
source /opt/anaconda/bin/activate
conda activate scd-reduction-tools
The spyder scientific environment can be used for scripting.
spyder
Local environment instructions#
1. Install miniconda. Miniconda is a minimal installtion of conda and Python that allows other packages to be installed with conda install.
Open a local terminal to interact with conda.
Windows: Anaconda Powershell Prompt
MacOS/Linux: Source miniconda
source ~/miniconda3/bin/activate
Download nxs tutorial with neutron data.
Option #1:
Install git in the (base) environment.
conda install git
Clone remote repository nxs-computational-tutorial.
git clone https://github.com/zjmorgan/nxs-computational-tutorial.git
Option #2:
Navigate to nxs-computational-tutorial
Click on Code/Download ZIP.
Unzip folder in download directory.
Change directory of terminal to download directory.
cd ~/Downloads
Change directory to tutorial.
cd nxs-computational-tutorial
Create (nxs) environment from file. Required packages are installed.
conda env create --file environment.yml
Activate environment with installed packages.
conda activate nxs
Test installation.
python
import lmfit
exit()
Open a tutorial Jupyter Notebook in a web-browser.
jupyter tutorial2/fit_optimize.ipynb
Introduction to fitting and optimization#
In this tutorial, two libraries and several optimization routines will be used to fit a single crystal peak.
scipy.optimize.curve_fit
scipy.optimize.least_squares
lmfit
The overview is as follows:
Loading and plotting data
Defining and fitting a model
Inspecting and interpreting output
Note
First begin by loading the required libraries.
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
import lmfit
import h5py
Loading data with h5py
#
Load the mantid workspace with histogramed counts vs. \(d\)-spacing bin edges. The values
contain the counts and errors
contain the uncertainties.
f = h5py.File('../data/elastic.nxs', mode='r')
ws = f['mantid_workspace_1/workspace/']
d_spacing_bin_edges = ws['axis1'][()]
counts = ws['values'][()].flatten()
errors = ws['errors'][()].flatten()
f.close()
Plotting the data with matplotlib
#
It is necessary to calculate the \(d\)-spacing bin centers from the bin edges. The number of histogram bins is one less than the number of edges.
Matplotlib can appropriately plot error bars.
d_spacing = 0.5*(d_spacing_bin_edges[1:]+d_spacing_bin_edges[:-1])
fig, ax = plt.subplots(1,1)
ax.errorbar(d_spacing, counts, yerr=errors, fmt='.', label='data')
ax.legend(shadow=True)
ax.minorticks_on()
Defining a model as a function and guessing initial parameters#
The peak could be modeled as a Gaussian peak with a linear sloping background.
Gaussian peak:
\(A=\text{amplitude}\)
\(\mu=\text{mean}\)
\(\sigma=\text{standard deviation}\)
Sloping background:
\(B=\text{intercept}\)
\(c=\text{slope}\)
It is illustrative to guess the initial parameters. This is often a helpful starting point for fitting a model. This initial guess is labeled as p0
.
def model(d, A, mu, sigma, B, c):
return A*np.exp(-0.5*(d-mu)**2/sigma**2)+B+c*d
A, mu, sigma, B, c = 800, 16, 0.3, 100, 0
p0 = (A, mu, sigma, B, c)
ax.plot(d_spacing, model(d_spacing, *p0), label='guess')
ax.legend(shadow=True)
fig
Fitting the model using scipy.optimize.curve_fit
#
Since the model can be expressed as a \(y=f(d,\text{parameters})\), curve_fit
is a good candidate. Providing the model function, initial guess, independent and dependent variables, it gives optimized parameters popt
and the covariance matrix of the parameters pcov
.
Since the experimental uncertainties are available and correctly scaled, they can be incorporated into the fit. Here, Levenberg-Marquardt method is used.
popt, pcov = scipy.optimize.curve_fit(model, d_spacing, counts, p0=p0,
sigma=errors, absolute_sigma=True,
method='lm')
ax.plot(d_spacing, model(d_spacing, *popt), label='fit')
ax.legend(shadow=True)
fig
Plotting the residuals#
It is good practice to plot the residuals of the final fit to ensure the fit is of good quality. This is indicated by absence of systematic trends which suggest the model is not correct. If the model describes the data well, then the residuals should be fluctuations about zero corresponding to experimental uncertainty.
def residual(d, counts, A, mu, sigma, B, c):
return counts-model(d, A, mu, sigma, B, c)
ax.plot(d_spacing, residual(d_spacing, counts, *popt), label='residual')
ax.legend(shadow=True)
fig
Calculating errors on fitted parameters#
The covariance matrix provides the correlation between fitted parameters.
The diagonal of the matrix corresponds to the variance of the individual parameters. Simply taking the square root gives the errors (standard deviation) \(\epsilon_i=\sqrt{C_{ii}}\).
perr = np.sqrt(np.diag(pcov))
print('Fitted uncertainties as 1-std using curve_fit')
print('A = {:8.3f} ± {:5.3f}'.format(popt[0],perr[0]))
print('mu = {:8.3f} ± {:5.3f}'.format(popt[1],perr[1]))
print('sigma = {:8.3f} ± {:5.3f}'.format(popt[2],perr[2]))
print('B = {:8.3f} ± {:5.3f}'.format(popt[3],perr[3]))
print('c = {:8.3f} ± {:5.3f}'.format(popt[4],perr[4]))
Fitting the model using scipy.optimize.least_squares
#
It is also possible to define the least squares problem equivalently to the curve fitting problem. The objective function to optimize is the \(\chi^2\) parameter which accounts for the residuals weighted by the experimental uncertainties. That is,
which is the sum of weighted deviations \([y_i-f(d,\text{parameters})]/\sigma_i\). Levenberg-Marquardt can be used again and a solution is returned that contains the fitted parameters sol.x
and weighted Jacobian sol.jac
. The covariance matrix is constructed from the Jacobian according to
that contains the weight matrix \(\boldsymbol{W}\).
def weighted_deviations(x, d, counts, errors):
A, mu, sigma, B, c = x
return residual(d, counts, A, mu, sigma, B, c)/errors
sol = scipy.optimize.least_squares(weighted_deviations, x0=p0,
args=(d_spacing, counts, errors),
method='lm')
vals = sol.x
J = sol.jac
cov = np.linalg.inv(J.T.dot(J))
err = np.sqrt(np.diag(cov))
print('Fitted uncertainties as 1-std using least_squares')
print('A = {:8.3f} ± {:5.3f}'.format(vals[0],err[0]))
print('mu = {:8.3f} ± {:5.3f}'.format(vals[1],err[1]))
print('sigma = {:8.3f} ± {:5.3f}'.format(vals[2],err[2]))
print('B = {:8.3f} ± {:5.3f}'.format(vals[3],err[3]))
print('c = {:8.3f} ± {:5.3f}'.format(vals[4],err[4]))
Calculating standard errors#
It is often useful to distinguish the standard deviation \(\epsilon_j\) of a parameter from the standard error \(\hat{\epsilon}_j\). Using the reduced \(\chi_\nu^2\) statistic, the standard error can be calculated. Here, the standard error of a parameter is defined as the value change of each parameter that causes an increase in the \(\chi^2\) by one. That is,
where \(\beta_j\) is the optimized parameter. This can be calculated according to
where the \(\chi^2\) per degree of freedom is
and \(m\) is the number of observations and \(n\) is the number of parameters.
chi2dof = np.sum(sol.fun**2)/(sol.fun.size-sol.x.size)
cov *= chi2dof
stderr = np.sqrt(np.diag(cov))
print('Fitted uncertainties as 1-stderr using least_squares')
print('A = {:8.3f} ± {:5.3f}'.format(vals[0],stderr[0]))
print('mu = {:8.3f} ± {:5.3f}'.format(vals[1],stderr[1]))
print('sigma = {:8.3f} ± {:5.3f}'.format(vals[2],stderr[2]))
print('B = {:8.3f} ± {:5.3f}'.format(vals[3],stderr[3]))
print('c = {:8.3f} ± {:5.3f}'.format(vals[4],stderr[4]))
Using constrained optimization with lmfit
#
A more general library for fitting and optimization with consistent syntax is lmfit
that supports constraints and boundaries. A parameter dictionary object is created with initial values and boundaries. The boundaries are enforced by a change of variables of the Jacobian to prevent the parameter from going beyond the boundaries. For the peak model, there are some boundaries that are reasonable.
Positive amplitude \(0 \le A \le \infty\)
Positive standard deviation \(0 \le \sigma \le \infty\)
Positive background \(0 \le B \le \infty\)
The library supports automatic calculation of standard errors.
def weighted_residual(params, d, counts, errors):
A = params['A']
mu = params['mu']
sigma = params['sigma']
B = params['B']
c = params['c']
return residual(d, counts, A, mu, sigma, B, c)/errors
params = lmfit.Parameters()
params.add('A', value=A, min=0, max=np.inf)
params.add('mu', value=mu, min=-np.inf, max=np.inf)
params.add('sigma', value=sigma, min=0, max=np.inf)
params.add('B', value=B, min=0, max=np.inf)
params.add('c', value=c, min=-np.inf, max=np.inf)
result = lmfit.minimize(weighted_residual, params,
args=(d_spacing, counts, errors))
print('Fitted uncertainties as 1-stderr using lmfit')
for key in params.keys():
value, stderr = result.params[key].value, result.params[key].stderr
print('{:7} = {:8.3f} ± {:5.3f}'.format(key,value,stderr))