# Python Workshop II: Fancy background and asymmetric peak fitting

Now we move on to the next improvement: improving the quality of our fitting. In the previous workshop, we fit the silicon Raman peak with a single symmetric function. Closer inspection of the fit would have revealed some discrepancies in the background as well as the peak fit. The peak is decidedly asymmetric but for the sake of learning, we went ahead and assumed it was symmetric. Now that we are a little more advanced and our fingers are getting nimbler with typing out some fine code, we can get fancy! We will use the the interpolate function available in Scipy to do a better background removal and, we can tweak the lorentzian function into a split lorentzian, i.e. each half of the peak is fit with lorentzians of different widths. As before, the script for the entire analysis is given below. Copy the whole thing into an editor and save it as ‘name_of_file.py’. The data is the same as the previous workshop and is available here. Please download it and put it into the same directory as the script. The code is pretty self-explanatory but here is an outline of the analysis routine:

**1. Import data**: Note that the Savitzky-Golay smoothing algorithm and explanation is available here. Also, if you don’t want any smoothing done, just put a ‘#’ at the beginning of that to comment that action.

**2. Background removal.** In this section, the background region is passed into the interpolate function which fits an arbitrary function (*f*) to the dataset. This function is then used to ‘calculate’ the background for the entire dataset and subtracted from the data. Note, a little bit of tweaking was in order here to get the limits of the background dataset right. I just played with the lower and upper limits until I got a visually good fit to the background.

**3. Fitting. **The lorentzian function from the previous function has been hacked such that the whole peak is fit to two coupled lorentzians. These two functions have the same peak center and intensity but different linewidths. The optimised parameters are then put into the variable *‘best_parameters’*. The fit is evaluated by substituting *‘best_parameters’ *into the lorentzian function.

**4. Plotting data and fit**

[sourcecode language=”python”]

##########################################################################

##################### IMPORTING REQUIRED MODULES ####################

import numpy

import pylab

from scipy.optimize import leastsq # Levenberg-Marquadt Algorithm #

from savitzky_golay import savitzky_golay ## Smoothing algorithm ##

from scipy.interpolate import interp1d ##Interpolation ##

#########################################################################

########################## LOADING DATA ###############################

a = numpy.loadtxt(‘silicon_Raman.txt’)

x = a[:,0]

y = a[:,1]

y = savitzky_golay(y,11,3)

## SORTING ASCENDING ORDER ##

y_temp = []

x_temp = []

ind_sort = numpy.lexsort((y,x))

for j in ind_sort:

y_temp.append(y[j])

x_temp.append(x[j])

x = numpy.array((x_temp))

y = numpy.array((y_temp))

########################################################################

######################### DEFINING FUNCTIONS ########################

def lorentzian(x,p):

min_step = x[1] – x[0]

ind1 = (x< p[1]) x1 = x[ind1] ind2 = (x > p[1])

x2 = x[ind2]

numerator_left = (p[0]**2 )

denominator_left = ( x1 – (p[1]) )**2 + p[0]**2

numerator_right = (p[2]**2 )

denominator_right = ( x2 – (p[1]) )**2 + p[2]**2

y_left = p[3]*(numerator_left/denominator_left)

y_right = p[3]*(numerator_right/denominator_right)

lin_comb = numpy.hstack((y_left,y_right))

return lin_comb

def residuals(p,y,x):

err = y – lorentzian(x,p)

return err

###########################################################################

###################### BACKGROUND SUBTRACTION #######################

# defining the ‘background’ part of the spectrum #

ind_bg_low = (x > 0.0) & (x < 470.0) ind_bg_high = (x > 600.0) & (x < 1000.0)

x_bg = numpy.concatenate((x[ind_bg_low],x[ind_bg_high]))

y_bg = numpy.concatenate((y[ind_bg_low],y[ind_bg_high]))

#pylab.plot(x_bg,y_bg)

# interpolating the background #

f = interp1d(x_bg,y_bg)

background = f(x)

# removing fitted background #

y_bg_corr = y – background

########################################################################

########################## FITTING DATA ## ############################

# initial values #

p = [5.0,520.0,5.0,12e3] # [hwhm1, peak center, hwhm2, intensity] #

# optimization #

pbest = leastsq(residuals,p,args=(y_bg_corr,x),full_output=1)

best_parameters = pbest[0]

# fit to data #

fit = lorentzian(x,best_parameters)

#######################################################################

############################## PLOTTING ##############################

pylab.plot(x,y_bg_corr,’wo’)

pylab.plot(x,fit,’r-‘,lw=2)

pylab.xlabel(r’$\omega$ (cm$^{-1}$)’, fontsize=18)

pylab.ylabel(‘Intensity (a.u.)’, fontsize=18)

pylab.show()

[/sourcecode]

*Post contributed by Shrividya Ravi, VUW*

Hey,

Question from a coding-challenged person 🙂

what’s going on at ” ind1 = (x< p[1]) "? Is this saying that ind1 takes the value x whenever x is less than p[1]? Are the brackets working the magic here?

Cheers,

Hey Ben,

Basically, the function that is fitted dynamically changes the x value at the peak maximum with every fitting iteration until that value is optimised. The bit of code that you are referring actually returns a boolean array according to the condition given. I just tried out the single boolean condition and it doesn’t need breackets. So, ‘ind1 = x < p[1]' would also work. But, if you wanted to do a dual condition like: 'ind1 = (x > 300.0) & (x < 500.0)', the brackets are necessary. Hope this make sense!