Retention Indices
Example 1: Calibration Points from a Delimited File
Let's assume we have a set of calibration points for calculating the Kovats retention index, stored in a delimited file. The first step is to read and format the data so it can be passed to the RiMapper
constructor. In this example, we are using a .CAL file generated by the AMDIS software. The contents of the file example1.CAL
are as follows:
4.154 900.0 98 1478 Nonane
5.635 1000.0 100 1215 Decane
7.145 1100.0 95 1606 Undecane
8.628 1200.0 100 1762 Dodecane
10.043 1300.0 100 1782 Tridecane
⋮
25.435 3400.0 92 920 Tetratriacontane
27.204 3500.0 90 819 Pentatriacontane
29.366 3600.0 91 723 Hexatriacontane
32.026 3700.0 88 602 Heptatriacontane
35.273 3800.0 85 493 Octatriacontane
As we can see, the file contains whitespace-separated columns, but only the first two are relevant for our purposes. The first column lists the retention times (in minutes), while the second column provides the corresponding retention indices.
We will use a function from the DelimitedFiles.jl package, which comes with Julia, to read the file contents. Additionally, we need to assign the minute time unit to the time values. Since JuChrom.jl re-exports names from the Unitful.jl package, we will also load JuChrom.jl at this stage. To plot the inferred mapping function, we will further load the CairoMakie.jl package from the Makie visualization ecosystem. If you haven't installed them yet, you may need to do so. Additionally, we explicitly activate the CairoMakie backend to ensure it is used, especially if another backend was previously active in the same session.
using CairoMakie, DelimitedFiles, JuChrom
CairoMakie.activate!()
filename = "example1.CAL"
file = joinpath(JuChrom.calibration, filename)
data_cells = readdlm(file; header=false) # set header=true if the file contains a header
30×5 Matrix{Any}:
4.154 900.0 98 1478 "Nonane"
5.635 1000.0 100 1215 "Decane"
7.145 1100.0 95 1606 "Undecane"
8.628 1200.0 100 1762 "Dodecane"
10.043 1300.0 100 1782 "Tridecane"
11.39 1400.0 91 1909 "Tetradecane"
12.667 1500.0 96 1956 "Pentadecane"
13.877 1600.0 100 1925 "Hexadecane"
14.876 1700.0 99 2029 "Heptadecane"
15.608 1800.0 95 2021 "Octadecane"
⋮
21.026 3000.0 86 1742 "Triacontane"
21.83 3100.0 93 1134 "Hentriacontane"
22.807 3200.0 92 1105 "Dotriacontane"
23.991 3300.0 94 1020 "Tritriacontane"
25.435 3400.0 92 920 "Tetratriacontane"
27.204 3500.0 90 819 "Pentatriacontane"
29.366 3600.0 91 723 "Hexatriacontane"
32.026 3700.0 88 602 "Heptatriacontane"
35.273 3800.0 85 493 "Octatriacontane"
The output above displays the contents of the matrix referred to by the variable data_cells
at this point. We will use slice notation to extract the values from the first and second columns. Additionally, we will convert these values to Float64
and append the minute time unit to the retention time values. The following two lines of code will achieve this:
rts = convert(Vector{Float64}, data_cells[:, 1]) * u"minute"
30-element Vector{Quantity{Float64, 𝐓, Unitful.FreeUnits{(minute,), 𝐓, nothing}}}:
4.154 minute
5.635 minute
7.145 minute
8.628 minute
10.043 minute
11.39 minute
12.667 minute
13.877 minute
14.876 minute
15.608 minute
⋮
21.026 minute
21.83 minute
22.807 minute
23.991 minute
25.435 minute
27.204 minute
29.366 minute
32.026 minute
35.273 minute
ris = convert(Vector{Float64}, data_cells[:, 2])
30-element Vector{Float64}:
900.0
1000.0
1100.0
1200.0
1300.0
1400.0
1500.0
1600.0
1700.0
1800.0
⋮
3000.0
3100.0
3200.0
3300.0
3400.0
3500.0
3600.0
3700.0
3800.0
We can now create a RiMapper
object by calling its constructor with the required arguments, along with any optional ones if needed. The mandatory arguments include the name of the retention index (in our case, Kovats), the retention time, and the corresponding retention indices. In this example, we also store the name of the calibration file as metadata
.
ld = RiMapper("Kovats", rts, ris, metadata=Dict(:filename => filename))
RiMapper {index name: Kovats, calibration points: 30}
retention time range: 4.154 minute - 35.273 minute
retention index range: 900.0 - 3800.0
interpolation method: NaturalCubicBSpline(false)
extrapolation method: nothing
metadata: 1 entry
Since we did not explicitly specify an interpolation method, the constructor defaulted to NaturalCubicBSpline
for interpolation.
We now have a RiMapper
object that allows us to calculate the retentionindex
for a given retention time of interest. We'll use this to plot the mapping function. Since we will be plotting additional mapping functions, we'll encapsulate the code for plotting the mapping function into a reusable function that can be called multiple times.
function plotmappingfunction(ld::RiMapper, outputfile::AbstractString)
# Create figure
f = Figure(; size=(1200, 600))
# Create axis in figure, including informative title and axis labels
title = get(metadata(ld), :filename, "")
ri_name = retentionindexname(ld)
timeunit = unit(eltype(retentiontimes(ld)))
ax = Axis(f[1,1], title=title, xlabel="Scan time [$timeunit]",
ylabel="$ri_name retention index")
# Plot calibration points
cal = scatter!(ax, retentiontimes(ld, ustripped=true), retentionindices(ld), color=:red)
# Plot interpolated values
xs = LinRange(minretentiontime(ld), maxretentiontime(ld), 1000)
itp = lines!(ax, ustrip(xs), retentionindex.(ld, xs), color=:blue)
# Add an informative legend
axislegend(ax, [cal, itp], ["calibration points", "interpolation"], position = :lt,
orientation = :horizontal)
# Save figure in svg file format
save(outputfile, f)
end
plotmappingfunction(ld, "rt2ri.svg")
This will produce the following Scalable Vector Graphics (SVG) file:
Example 2: Calibration Points from an Excel File
In this example, we assume that a set of calibration points for calculating the Kovats retention index has been stored in an Excel file, possibly as a result of a manual entry. The relevant data can be found in the first sheet of the file example1.xlsx
, named Table1
.
We will use the XLSX.jl package to read the contents of the Excel file. As in the previous example, we need to assign the time values a unit of minutes. To accomplish this, we will import JuChrom.jl, which re-exports functionality from the Unitful.jl package.
using JuChrom
import XLSX
filename = "example1.xlsx"
file = joinpath(JuChrom.calibration, filename)
sheetname = "Table1"
excel_cells = "A1:B30"
data_cells = XLSX.readdata(file, sheetname, excel_cells)
30×2 Matrix{Any}:
4.154 900
5.635 1000
7.145 1100
8.628 1200
10.043 1300
11.39 1400
12.667 1500
13.877 1600
14.876 1700
15.608 1800
⋮
21.026 3000
21.83 3100
22.807 3200
23.991 3300
25.435 3400
27.204 3500
29.366 3600
32.026 3700
35.273 3800
The output above shows the contents of the matrix stored in the variable data_cells
at this point. As in the previous example, we will use slice notation to extract the values from the first and second columns, convert them to Float64
, and append the minute time unit to the retention time values.
timeunit = u"minute"
rts = convert(Vector{Float64}, data_cells[:, 1]) * timeunit
ris = convert(Vector{Float64}, data_cells[:, 2])
ld = RiMapper("Kovats", rts, ris, metadata=Dict(:filename => filename))
RiMapper {index name: Kovats, calibration points: 30}
retention time range: 4.154 minute - 35.273 minute
retention index range: 900.0 - 3800.0
interpolation method: NaturalCubicBSpline(false)
extrapolation method: nothing
metadata: 1 entry
We could reuse the plotting function from the first example to visualize the data. However, we will refrain from doing so here, as the values in the Excel file are identical to those in the delimited file. Instead, we will discuss an example where the imported calibration points result in an error when inferring a NaturalCubicBSpline
interpolant.
Example 3: Error in Inferring NaturalCubicBSpline Interpolant
In the two previous examples, the computation of a natural cubic B-spline for calculating a continuously increasing retention index worked as expected. However, this is not always the case. If the computed B-spline contains critical points that would cause the retention index to decrease as retention time increases, the creation of the RiMapper
would result in an error. Let's examine an example. The contents of the file example2.CAL
are as follows:
3.394 600.0 100 742 Hexane
4.154 900.0 98 1478 Nonane
5.635 1000.0 100 1215 Decane
7.145 1100.0 95 1606 Undecane
8.628 1200.0 100 1762 Dodecane
⋮
25.435 3400.0 92 920 Tetratriacontane
27.204 3500.0 90 819 Pentatriacontane
29.366 3600.0 91 723 Hexatriacontane
32.026 3700.0 88 602 Heptatriacontane
35.273 3800.0 85 493 Octatriacontane
The data is similar to that in example1.CAL
, with the only difference being that example2.CAL
includes the retention time and retention index of Hexane as an additional calibration point. We'll run the same commands as in the previous example, but this time, we'll specify a different input file and wrap the RiMapper
call in a try
/ catch
block to handle any exceptions.
using CairoMakie, DelimitedFiles, JuChrom
CairoMakie.activate!()
filename = "example2.CAL"
file = joinpath(JuChrom.calibration, filename)
data_cells = readdlm(file; header=false)
rts = convert(Vector{Float64}, data_cells[:, 1]) * u"minute"
ris = convert(Vector{Float64}, data_cells[:, 2])
try
global ld = RiMapper("Kovats", rts, ris, metadata=Dict(:filename => filename))
catch e
println(e)
end
ArgumentError("computed Bspline contains critical point(s): [5.128030033274248, 5.526092831780077] * minute")
As shown in the output, the creation of a natural cubic B-spline that predicts continuously increasing retention indices with increasing retention time has failed. However, we can still force the B-spline interpolator to be returned, even if it contains critical points. This is achieved by explicitly specifying NaturalCubicBSpline
as interpolator method and setting the force
option in its keyword argument to true
. This approach can be useful for identifying problematic or erroneous calibration points. Let's give it a try.
ld = RiMapper("Kovats", rts, ris, metadata=Dict(:filename => filename),
interpolationmethod=NaturalCubicBSpline(force=true))
RiMapper {index name: Kovats, calibration points: 31}
retention time range: 3.394 minute - 35.273 minute
retention index range: 600.0 - 3800.0
interpolation method: NaturalCubicBSpline(true)
extrapolation method: nothing
metadata: 1 entry
Let's use the returned RiMapper
object to plot the compromised mapping function using the code from the previous example.
# <- Insert plotmappingfunction code here
plotmappingfunction(ld, "rt2ri_2.svg")
This will produce the following Scalable Vector Graphics (SVG) file:
As observed, the sharp increase in the retention index between 3.394 and 4.154 minutes prompted the creation of a B-spline featuring two critical points: a local maximum at 5.128 minutes and a local minimum at 5.526 minutes. This outcome is not due to Hexane being incorrectly identified or associated with an incorrect retention time. Instead, the pronounced disparity in the retention time–retention index relationship in the first segment of the B-spline, compared to the following segments, causes the B-spline to oscillate. This oscillation suggests that the available set of calibration points at the start of the run is insufficient for reliable prediction of retention indices in this region using the chosen interpolation method. It is entirely possible that a denser sampling of calibration points at the beginning of the run could resolve this issue, particularly given the large RI interval between Hexane and Nonane compared to subsequent calibration points.
If the RiMapper
is intended to interpolate values between, for example, minutes 10 and 30, the simplest solution might be to omit this calibration point. Alternatively, one could choose the PiecewiseLinear
interpolation method, which avoids oscillations. However, it introduces discontinuities in its derivative, making it a less desirable interpolation method.
Example 4: Plotting TIC against RI
It is often preferred to plot chromatographic intensity values against retention index values rather than scan time. This requires chromatographic run data and corresponding calibration data, which allow the calculation of an RiMapper for converting retention times into retention indices.
Let's assume we wish to plot total ion chromatogram (TIC) intensities from a gas chromatography/mass spectrometry (GCMS) run against Kovats retention index values. In this example, the run file is stored in the ChemStationMS
data format in the folder ON16150_April17_2024_I.D
. The corresponding calibration file, 2024-04-11_C7-C40_11_APR_24_V09.CAL
, contains retention time and retention index value pairs in the AMDIS .CAL file format (see Examples 1 and 3 above).
To load the calibration file data, we will again use the DelimitedFiles.jl package, which comes with Julia. We will need the JuChrom.jl package to read the ChemStation MS run data and to process and apply the calibration file data. Finally, we will use the CairoMakie.jl package to generate the desired figure. Note that you may need to install these packages if you haven't done so already.
We begin by reading and processing the calibration file data to create an RiMapper
object, following the general procedure outlined in Example 1.
using CairoMakie, DelimitedFiles, JuChrom
CairoMakie.activate!()
calpath = joinpath(JuChrom.calibration, "empirical_data", "calfiles")
calfile = joinpath(calpath, "2024-04-11_C7-C40_11_APR_24_V09.CAL")
cells = readdlm(calfile; header=false)
rts = convert(Vector{Float64}, cells[:, 1]) * u"minute"
ris = convert(Vector{Float64}, cells[:, 2])
ld = RiMapper("Kovats", rts, ris)
RiMapper {index name: Kovats, calibration points: 30}
retention time range: 4.154 minute - 35.273 minute
retention index range: 900.0 - 3800.0
interpolation method: NaturalCubicBSpline(false)
extrapolation method: nothing
metadata: 0 entries
As indicated by the output, the computation of the RiMapper
was successful (refer to Example 3 for handling errors). We can now proceed with reading and processing the GCMS run data.
runpath = joinpath(JuChrom.calibration, "empirical_data", "runs")
runfolder = joinpath(runpath, "ON16150_April17_2024_I.D")
chrom = importdata(runfolder, ChemStationMS())
rimapper!(chrom, ld) # stores a reference to the RiMapper in ChromMS object
tic = totalionchromatogram(chrom)
Chrom {scan times: Float32, intensities: Int64}
2405 scans; scan time range: 191944.0f0 ms - 1.89905f6 ms
intensity range: 35402 - 16107576
metadata: 10 entries
retention index mapper: Kovats
We now have everything set to generate the desired figure. In this specific example, we only map intensities at retention times that fall within the range of the calibration points, avoiding the extrapolation of retention indices. Additionally, we allow further restriction of the retention index range to be plotted in the figure.
ri_min, ri_max = 900, 1800
f = Figure(size=(1200, 600))
ax = Axis(f[1,1], xlabel="Kovats retention index", ylabel="Abundance")
ris, ints = [], []
for i in 1:scancount(tic)
rt = scantime(tic, i)
if minretentiontime(ld) ≤ rt ≤ maxretentiontime(ld)
ri = retentionindex(rimapper(tic), rt)
ri_min ≤ ri ≤ ri_max || continue
push!(ris, ri)
push!(ints, intensity(tic, i))
end
end
length(ris) ≥ 2 && lines!(ax, ris, ints, color=:blue)
save("tic_against_ri.svg", f)
This will produce the following SVG file:
Example 5: Plotting multiple TICs against RI
In some experimental designs, it is useful to plot the intensity values of different runs against retention index values in a single figure. In this example, we assume two treatment groups, labeled I
and R
, with two replicates for each group. The experimental data were analyzed using gas chromatography/mass spectrometry (GCMS).
We aim to plot the total ion chromatogram (TIC) intensities from the corresponding GCMS runs against Kovats retention indices. In this example, the run files are again stored in the ChemStationMS data format. The corresponding calibration files, which contain retention time and retention index value pairs, are in the AMDIS .CAL file format (see Examples 1, 3, and 4 above). To automatically process this data, we need a table indicating which calibration file is associated with each run. This information is stored in the run_calfilename_relation.xlsx
file in Excel format.
As in Example 4, we require functionality from the CairoMakie.jl, DelimitedFiles.jl, and JuChrom.jl packages. Additionally, we need the XLSX.jl package to read the Excel file. Note that you need to install these packages if they are not already installed.
The relevant data in the Excel file is located in the sheet labeled Table1
. Column A
contains the names of the run folders, and column B
contains the names of the corresponding .CAL files. Since we have four runs and no headers, the area of interest is A1:B4. We'll read these cells and store the data in a dictionary, using the run names as keys to retrieve the associated .CAL file names.
using CairoMakie, DelimitedFiles, JuChrom
CairoMakie.activate!()
import XLSX
file = joinpath(JuChrom.calibration, "empirical_data", "run_calfilename_relation.xlsx")
data_cells = XLSX.readdata(file, "Table1", "A1:B4")
cal4run = Dict(data_cells[row, 1] => data_cells[row, 2] for row in axes(data_cells, 1))
Dict{String, String} with 4 entries:
"ON16154_May28_2024_R.D" => "2024-05-28_C7-C40_28_MAY_24_V04.CAL"
"ON16150_April17_2024_I.D" => "2024-04-11_C7-C40_11_APR_24_V09.CAL"
"ON16150_April17_2024_R.D" => "2024-04-11_C7-C40_11_APR_24_V09.CAL"
"ON16154_May28_2024_I.D" => "2024-05-28_C7-C40_28_MAY_24_V04.CAL"
Since two runs are associated with each .CAL file, it makes sense to first process the .CAL files and create a dictionary from which we can retrieve the RiMapper
generated from a given .CAL file.
mpr4cal = Dict()
for calfilename in unique(values(cal4run))
calfile = joinpath(JuChrom.calibration, "empirical_data", "calfiles", calfilename)
cells = readdlm(calfile; header=false)
rts = convert(Vector{Float64}, cells[:, 1]) * u"minute"
ris = convert(Vector{Float64}, cells[:, 2])
mpr4cal[calfilename] = RiMapper("Kovats", rts, ris)
end
Dict{Any, Any} with 2 entries:
"2024-04-11_C7-C40_11_APR_24_V09.CAL" => RiMapper {index name: Kovats, calibr…
"2024-05-28_C7-C40_28_MAY_24_V04.CAL" => RiMapper {index name: Kovats, calibr…
Next, we read the GCMS data and store a reference to the corresponding RiMapper object in the resulting ChromMS object. The ChromMS objects are then collected in a list for further processing.
chroms = []
for run in keys(cal4run)
runfolder = joinpath(JuChrom.calibration, "empirical_data", "runs", run)
chrom = importdata(runfolder, ChemStationMS())
rimapper!(chrom, mpr4cal[cal4run[run]])
push!(chroms, chrom)
end
We now have all the data needed to generate a figure that plots the TIC from each run against the Kovats retention index. Since the plotted lines will overlap, we will make them transparent by setting the alpha channel to a value less than one. The treatment group is indicated by the letters I
and R
at the end of the run folder name, which is stored in the metadata
. We will use this information to plot the TICs of the two groups in different colors.
ri_min, ri_max = 900, 1800
alpha = 0.5
f = Figure(size=(1200, 600))
ax = Axis(f[1,1], xlabel="Kovats retention index", ylabel="Abundance")
for chrom in chroms
tic = totalionchromatogram(chrom)
ld = rimapper(tic)
ris, ints = [], []
for i in 1:scancount(tic)
rt = scantime(tic, i)
if minretentiontime(ld) ≤ rt ≤ maxretentiontime(ld)
ri = retentionindex(rimapper(tic), rt)
ri_min ≤ ri ≤ ri_max || continue
push!(ris, ri)
push!(ints, intensity(tic, i))
end
end
if length(ris) ≥ 2
c = endswith(metadata(tic)[:sample], "R") ? (:blue, alpha) : (:red, alpha)
lines!(ax, ris, ints, color=c)
end
end
save("tics_in_one_axis.svg", f)
This will produce the following SVG file:
Although the TICs for the two treatment groups are plotted in different colors, it can still be difficult to visually discern systematic differences between them. Furthermore, if each treatment group contains more runs, the plot can become cluttered and overloaded. Therefore, it may be helpful to plot the TICs for the two treatment groups on separate, mirrored axes along the x-axis. The following lines of code generate such a figure.
ri_min, ri_max = 900, 1800
alpha = 0.5
f2 = Figure(size=(1200, 600))
ax1 = Axis(f2[1,1], limits = (nothing, nothing, 0, nothing), ylabel="Abundance")
ax2 = Axis(f2[2,1], limits = (nothing, nothing, 0, nothing),
xlabel="Kovats retention index", ylabel="Abundance", yreversed=true)
linkxaxes!(ax1, ax2)
rowgap!(f2.layout, 0)
hidexdecorations!(ax1, grid=false)
for chrom in chroms
tic = totalionchromatogram(chrom)
ris, ints = [], []
for i in 1:scancount(tic)
rt = scantime(tic, i)
if minretentiontime(rimapper(tic)) ≤ rt ≤ maxretentiontime(rimapper(tic))
ri = retentionindex(rimapper(tic), rt)
ri_min ≤ ri ≤ ri_max || continue
push!(ris, ri)
push!(ints, intensity(tic, i))
end
end
if length(ris) ≥ 2
if endswith(metadata(chrom)[:sample], "R")
lines!(ax1, ris, ints, color=(:blue, alpha))
else
lines!(ax2, ris, ints, color=(:red, alpha))
end
end
end
save("tics_in_two_axes.svg", f2)
This will produce the following SVG file: