Retention Indices

Example 1: Calibration Points from a Delimited File

Let's assume we have a set of calibration points for calculating the Kovats retention index, stored in a delimited file. The first step is to read and format the data so it can be passed to the RiMapper constructor. In this example, we are using a .CAL file generated by the AMDIS software. The contents of the file example1.CAL are as follows:

4.154 900.0 98 1478 Nonane
5.635 1000.0 100 1215 Decane
7.145 1100.0 95 1606 Undecane
8.628 1200.0 100 1762 Dodecane
10.043 1300.0 100 1782 Tridecane
  ⋮
25.435 3400.0 92 920 Tetratriacontane
27.204 3500.0 90 819 Pentatriacontane
29.366 3600.0 91 723 Hexatriacontane
32.026 3700.0 88 602 Heptatriacontane
35.273 3800.0 85 493 Octatriacontane

As we can see, the file contains whitespace-separated columns, but only the first two are relevant for our purposes. The first column lists the retention times (in minutes), while the second column provides the corresponding retention indices.

We will use a function from the DelimitedFiles.jl package, which comes with Julia, to read the file contents. Additionally, we need to assign the minute time unit to the time values. Since JuChrom.jl re-exports names from the Unitful.jl package, we will also load JuChrom.jl at this stage. To plot the inferred mapping function, we will further load the CairoMakie.jl package from the Makie visualization ecosystem. If you haven't installed them yet, you may need to do so. Additionally, we explicitly activate the CairoMakie backend to ensure it is used, especially if another backend was previously active in the same session.

using CairoMakie, DelimitedFiles, JuChrom
CairoMakie.activate!()

filename = "example1.CAL"
file = joinpath(JuChrom.calibration, filename)
data_cells = readdlm(file; header=false)  # set header=true if the file contains a header
30×5 Matrix{Any}:
  4.154   900.0   98  1478  "Nonane"
  5.635  1000.0  100  1215  "Decane"
  7.145  1100.0   95  1606  "Undecane"
  8.628  1200.0  100  1762  "Dodecane"
 10.043  1300.0  100  1782  "Tridecane"
 11.39   1400.0   91  1909  "Tetradecane"
 12.667  1500.0   96  1956  "Pentadecane"
 13.877  1600.0  100  1925  "Hexadecane"
 14.876  1700.0   99  2029  "Heptadecane"
 15.608  1800.0   95  2021  "Octadecane"
  ⋮                         
 21.026  3000.0   86  1742  "Triacontane"
 21.83   3100.0   93  1134  "Hentriacontane"
 22.807  3200.0   92  1105  "Dotriacontane"
 23.991  3300.0   94  1020  "Tritriacontane"
 25.435  3400.0   92   920  "Tetratriacontane"
 27.204  3500.0   90   819  "Pentatriacontane"
 29.366  3600.0   91   723  "Hexatriacontane"
 32.026  3700.0   88   602  "Heptatriacontane"
 35.273  3800.0   85   493  "Octatriacontane"

The output above displays the contents of the matrix referred to by the variable data_cells at this point. We will use slice notation to extract the values from the first and second columns. Additionally, we will convert these values to Float64 and append the minute time unit to the retention time values. The following two lines of code will achieve this:

rts = convert(Vector{Float64}, data_cells[:, 1]) * u"minute"
30-element Vector{Quantity{Float64, 𝐓, Unitful.FreeUnits{(minute,), 𝐓, nothing}}}:
  4.154 minute
  5.635 minute
  7.145 minute
  8.628 minute
 10.043 minute
 11.39 minute
 12.667 minute
 13.877 minute
 14.876 minute
 15.608 minute
  ⋮
 21.026 minute
 21.83 minute
 22.807 minute
 23.991 minute
 25.435 minute
 27.204 minute
 29.366 minute
 32.026 minute
 35.273 minute
ris = convert(Vector{Float64}, data_cells[:, 2])
30-element Vector{Float64}:
  900.0
 1000.0
 1100.0
 1200.0
 1300.0
 1400.0
 1500.0
 1600.0
 1700.0
 1800.0
    ⋮
 3000.0
 3100.0
 3200.0
 3300.0
 3400.0
 3500.0
 3600.0
 3700.0
 3800.0

We can now create a RiMapper object by calling its constructor with the required arguments, along with any optional ones if needed. The mandatory arguments include the name of the retention index (in our case, Kovats), the retention time, and the corresponding retention indices. In this example, we also store the name of the calibration file as metadata.

ld = RiMapper("Kovats", rts, ris, metadata=Dict(:filename => filename))
RiMapper {index name: Kovats, calibration points: 30}
retention time range: 4.154 minute - 35.273 minute
retention index range: 900.0 - 3800.0
interpolation method: NaturalCubicBSpline(false)
extrapolation method: nothing
metadata: 1 entry

Since we did not explicitly specify an interpolation method, the constructor defaulted to NaturalCubicBSpline for interpolation.

We now have a RiMapper object that allows us to calculate the retentionindex for a given retention time of interest. We'll use this to plot the mapping function. Since we will be plotting additional mapping functions, we'll encapsulate the code for plotting the mapping function into a reusable function that can be called multiple times.

function plotmappingfunction(ld::RiMapper, outputfile::AbstractString)
  # Create figure
  f = Figure(; size=(1200, 600))

  # Create axis in figure, including informative title and axis labels
  title = get(metadata(ld), :filename, "")
  ri_name = retentionindexname(ld)
  timeunit = unit(eltype(retentiontimes(ld)))
  ax = Axis(f[1,1], title=title, xlabel="Scan time [$timeunit]",
    ylabel="$ri_name retention index")

  # Plot calibration points
  cal = scatter!(ax, retentiontimes(ld, ustripped=true), retentionindices(ld), color=:red)

  # Plot interpolated values
  xs = LinRange(minretentiontime(ld), maxretentiontime(ld), 1000)
  itp = lines!(ax, ustrip(xs), retentionindex.(ld, xs), color=:blue)

  # Add an informative legend
  axislegend(ax, [cal, itp], ["calibration points", "interpolation"], position = :lt,
    orientation = :horizontal)

  # Save figure in svg file format
  save(outputfile, f)
end

plotmappingfunction(ld, "rt2ri.svg")

This will produce the following Scalable Vector Graphics (SVG) file:

Example 2: Calibration Points from an Excel File

In this example, we assume that a set of calibration points for calculating the Kovats retention index has been stored in an Excel file, possibly as a result of a manual entry. The relevant data can be found in the first sheet of the file example1.xlsx, named Table1.

We will use the XLSX.jl package to read the contents of the Excel file. As in the previous example, we need to assign the time values a unit of minutes. To accomplish this, we will import JuChrom.jl, which re-exports functionality from the Unitful.jl package.

using JuChrom
import XLSX

filename = "example1.xlsx"
file = joinpath(JuChrom.calibration, filename)
sheetname = "Table1"
excel_cells = "A1:B30"

data_cells = XLSX.readdata(file, sheetname, excel_cells)
30×2 Matrix{Any}:
  4.154   900
  5.635  1000
  7.145  1100
  8.628  1200
 10.043  1300
 11.39   1400
 12.667  1500
 13.877  1600
 14.876  1700
 15.608  1800
  ⋮      
 21.026  3000
 21.83   3100
 22.807  3200
 23.991  3300
 25.435  3400
 27.204  3500
 29.366  3600
 32.026  3700
 35.273  3800

The output above shows the contents of the matrix stored in the variable data_cells at this point. As in the previous example, we will use slice notation to extract the values from the first and second columns, convert them to Float64, and append the minute time unit to the retention time values.

timeunit = u"minute"
rts = convert(Vector{Float64}, data_cells[:, 1]) * timeunit
ris = convert(Vector{Float64}, data_cells[:, 2])
ld = RiMapper("Kovats", rts, ris, metadata=Dict(:filename => filename))
RiMapper {index name: Kovats, calibration points: 30}
retention time range: 4.154 minute - 35.273 minute
retention index range: 900.0 - 3800.0
interpolation method: NaturalCubicBSpline(false)
extrapolation method: nothing
metadata: 1 entry

We could reuse the plotting function from the first example to visualize the data. However, we will refrain from doing so here, as the values in the Excel file are identical to those in the delimited file. Instead, we will discuss an example where the imported calibration points result in an error when inferring a NaturalCubicBSpline interpolant.

Example 3: Error in Inferring NaturalCubicBSpline Interpolant

In the two previous examples, the computation of a natural cubic B-spline for calculating a continuously increasing retention index worked as expected. However, this is not always the case. If the computed B-spline contains critical points that would cause the retention index to decrease as retention time increases, the creation of the RiMapper would result in an error. Let's examine an example. The contents of the file example2.CAL are as follows:

3.394 600.0 100 742 Hexane
4.154 900.0 98 1478 Nonane
5.635 1000.0 100 1215 Decane
7.145 1100.0 95 1606 Undecane
8.628 1200.0 100 1762 Dodecane
  ⋮
25.435 3400.0 92 920 Tetratriacontane
27.204 3500.0 90 819 Pentatriacontane
29.366 3600.0 91 723 Hexatriacontane
32.026 3700.0 88 602 Heptatriacontane
35.273 3800.0 85 493 Octatriacontane

The data is similar to that in example1.CAL, with the only difference being that example2.CAL includes the retention time and retention index of Hexane as an additional calibration point. We'll run the same commands as in the previous example, but this time, we'll specify a different input file and wrap the RiMapper call in a try/ catch block to handle any exceptions.

using CairoMakie, DelimitedFiles, JuChrom
CairoMakie.activate!()

filename = "example2.CAL"
file = joinpath(JuChrom.calibration, filename)
data_cells = readdlm(file; header=false)

rts = convert(Vector{Float64}, data_cells[:, 1]) * u"minute"
ris = convert(Vector{Float64}, data_cells[:, 2])

try
  global ld = RiMapper("Kovats", rts, ris, metadata=Dict(:filename => filename))
catch e
  println(e)
end
ArgumentError("computed Bspline contains critical point(s): [5.128030033274248, 5.526092831780077] * minute")

As shown in the output, the creation of a natural cubic B-spline that predicts continuously increasing retention indices with increasing retention time has failed. However, we can still force the B-spline interpolator to be returned, even if it contains critical points. This is achieved by explicitly specifying NaturalCubicBSpline as interpolator method and setting the force option in its keyword argument to true. This approach can be useful for identifying problematic or erroneous calibration points. Let's give it a try.

ld = RiMapper("Kovats", rts, ris, metadata=Dict(:filename => filename),
  interpolationmethod=NaturalCubicBSpline(force=true))
RiMapper {index name: Kovats, calibration points: 31}
retention time range: 3.394 minute - 35.273 minute
retention index range: 600.0 - 3800.0
interpolation method: NaturalCubicBSpline(true)
extrapolation method: nothing
metadata: 1 entry

Let's use the returned RiMapper object to plot the compromised mapping function using the code from the previous example.

# <- Insert plotmappingfunction code here
plotmappingfunction(ld, "rt2ri_2.svg")

This will produce the following Scalable Vector Graphics (SVG) file:

As observed, the sharp increase in the retention index between 3.394 and 4.154 minutes prompted the creation of a B-spline featuring two critical points: a local maximum at 5.128 minutes and a local minimum at 5.526 minutes. This outcome is not due to Hexane being incorrectly identified or associated with an incorrect retention time. Instead, the pronounced disparity in the retention time–retention index relationship in the first segment of the B-spline, compared to the following segments, causes the B-spline to oscillate. This oscillation suggests that the available set of calibration points at the start of the run is insufficient for reliable prediction of retention indices in this region using the chosen interpolation method. It is entirely possible that a denser sampling of calibration points at the beginning of the run could resolve this issue, particularly given the large RI interval between Hexane and Nonane compared to subsequent calibration points.

If the RiMapper is intended to interpolate values between, for example, minutes 10 and 30, the simplest solution might be to omit this calibration point. Alternatively, one could choose the PiecewiseLinear interpolation method, which avoids oscillations. However, it introduces discontinuities in its derivative, making it a less desirable interpolation method.

Example 4: Plotting TIC against RI

It is often preferred to plot chromatographic intensity values against retention index values rather than scan time. This requires chromatographic run data and corresponding calibration data, which allow the calculation of an RiMapper for converting retention times into retention indices.

Let's assume we wish to plot total ion chromatogram (TIC) intensities from a gas chromatography/mass spectrometry (GCMS) run against Kovats retention index values. In this example, the run file is stored in the ChemStationMS data format in the folder ON16150_April17_2024_I.D. The corresponding calibration file, 2024-04-11_C7-C40_11_APR_24_V09.CAL, contains retention time and retention index value pairs in the AMDIS .CAL file format (see Examples 1 and 3 above).

To load the calibration file data, we will again use the DelimitedFiles.jl package, which comes with Julia. We will need the JuChrom.jl package to read the ChemStation MS run data and to process and apply the calibration file data. Finally, we will use the CairoMakie.jl package to generate the desired figure. Note that you may need to install these packages if you haven't done so already.

We begin by reading and processing the calibration file data to create an RiMapper object, following the general procedure outlined in Example 1.

using CairoMakie, DelimitedFiles, JuChrom
CairoMakie.activate!()

calpath = joinpath(JuChrom.calibration, "empirical_data", "calfiles")
calfile = joinpath(calpath, "2024-04-11_C7-C40_11_APR_24_V09.CAL")

cells = readdlm(calfile; header=false)
rts = convert(Vector{Float64}, cells[:, 1]) * u"minute"
ris = convert(Vector{Float64}, cells[:, 2])
ld = RiMapper("Kovats", rts, ris)
RiMapper {index name: Kovats, calibration points: 30}
retention time range: 4.154 minute - 35.273 minute
retention index range: 900.0 - 3800.0
interpolation method: NaturalCubicBSpline(false)
extrapolation method: nothing
metadata: 0 entries

As indicated by the output, the computation of the RiMapper was successful (refer to Example 3 for handling errors). We can now proceed with reading and processing the GCMS run data.

runpath = joinpath(JuChrom.calibration, "empirical_data", "runs")
runfolder = joinpath(runpath, "ON16150_April17_2024_I.D")
chrom = importdata(runfolder, ChemStationMS())
rimapper!(chrom, ld)  # stores a reference to the RiMapper in ChromMS object
tic = totalionchromatogram(chrom)
Chrom {scan times: Float32, intensities: Int64}
2405 scans; scan time range: 191944.0f0 ms - 1.89905f6 ms
intensity range: 35402 - 16107576
metadata: 10 entries
retention index mapper: Kovats

We now have everything set to generate the desired figure. In this specific example, we only map intensities at retention times that fall within the range of the calibration points, avoiding the extrapolation of retention indices. Additionally, we allow further restriction of the retention index range to be plotted in the figure.

ri_min, ri_max = 900, 1800

f = Figure(size=(1200, 600))
ax = Axis(f[1,1], xlabel="Kovats retention index", ylabel="Abundance")
ris, ints = [], []
for i in 1:scancount(tic)
  rt = scantime(tic, i)
  if minretentiontime(ld) ≤ rt ≤ maxretentiontime(ld)
    ri = retentionindex(rimapper(tic), rt)
    ri_min ≤ ri ≤ ri_max || continue
    push!(ris, ri)
    push!(ints, intensity(tic, i))
  end
end
length(ris) ≥ 2 && lines!(ax, ris, ints, color=:blue)

save("tic_against_ri.svg", f)

This will produce the following SVG file:

Example 5: Plotting multiple TICs against RI

In some experimental designs, it is useful to plot the intensity values of different runs against retention index values in a single figure. In this example, we assume two treatment groups, labeled I and R, with two replicates for each group. The experimental data were analyzed using gas chromatography/mass spectrometry (GCMS).

We aim to plot the total ion chromatogram (TIC) intensities from the corresponding GCMS runs against Kovats retention indices. In this example, the run files are again stored in the ChemStationMS data format. The corresponding calibration files, which contain retention time and retention index value pairs, are in the AMDIS .CAL file format (see Examples 1, 3, and 4 above). To automatically process this data, we need a table indicating which calibration file is associated with each run. This information is stored in the run_calfilename_relation.xlsx file in Excel format.

As in Example 4, we require functionality from the CairoMakie.jl, DelimitedFiles.jl, and JuChrom.jl packages. Additionally, we need the XLSX.jl package to read the Excel file. Note that you need to install these packages if they are not already installed.

The relevant data in the Excel file is located in the sheet labeled Table1. Column A contains the names of the run folders, and column B contains the names of the corresponding .CAL files. Since we have four runs and no headers, the area of interest is A1:B4. We'll read these cells and store the data in a dictionary, using the run names as keys to retrieve the associated .CAL file names.

using CairoMakie, DelimitedFiles, JuChrom
CairoMakie.activate!()
import XLSX

file = joinpath(JuChrom.calibration, "empirical_data", "run_calfilename_relation.xlsx")
data_cells = XLSX.readdata(file, "Table1", "A1:B4")
cal4run = Dict(data_cells[row, 1] => data_cells[row, 2] for row in axes(data_cells, 1))
Dict{String, String} with 4 entries:
  "ON16154_May28_2024_R.D"   => "2024-05-28_C7-C40_28_MAY_24_V04.CAL"
  "ON16150_April17_2024_I.D" => "2024-04-11_C7-C40_11_APR_24_V09.CAL"
  "ON16150_April17_2024_R.D" => "2024-04-11_C7-C40_11_APR_24_V09.CAL"
  "ON16154_May28_2024_I.D"   => "2024-05-28_C7-C40_28_MAY_24_V04.CAL"

Since two runs are associated with each .CAL file, it makes sense to first process the .CAL files and create a dictionary from which we can retrieve the RiMapper generated from a given .CAL file.

mpr4cal = Dict()
for calfilename in unique(values(cal4run))
  calfile = joinpath(JuChrom.calibration, "empirical_data", "calfiles", calfilename)
  cells = readdlm(calfile; header=false)
  rts = convert(Vector{Float64}, cells[:, 1]) * u"minute"
  ris = convert(Vector{Float64}, cells[:, 2])
  mpr4cal[calfilename] = RiMapper("Kovats", rts, ris)
end
Dict{Any, Any} with 2 entries:
  "2024-04-11_C7-C40_11_APR_24_V09.CAL" => RiMapper {index name: Kovats, calibr…
  "2024-05-28_C7-C40_28_MAY_24_V04.CAL" => RiMapper {index name: Kovats, calibr…

Next, we read the GCMS data and store a reference to the corresponding RiMapper object in the resulting ChromMS object. The ChromMS objects are then collected in a list for further processing.

chroms = []
for run in keys(cal4run)
  runfolder = joinpath(JuChrom.calibration, "empirical_data", "runs", run)
  chrom = importdata(runfolder, ChemStationMS())
  rimapper!(chrom, mpr4cal[cal4run[run]])
  push!(chroms, chrom)
end

We now have all the data needed to generate a figure that plots the TIC from each run against the Kovats retention index. Since the plotted lines will overlap, we will make them transparent by setting the alpha channel to a value less than one. The treatment group is indicated by the letters I and R at the end of the run folder name, which is stored in the metadata. We will use this information to plot the TICs of the two groups in different colors.

ri_min, ri_max = 900, 1800
alpha = 0.5

f = Figure(size=(1200, 600))
ax = Axis(f[1,1], xlabel="Kovats retention index", ylabel="Abundance")
for chrom in chroms
  tic = totalionchromatogram(chrom)
  ld = rimapper(tic)
  ris, ints = [], []
  for i in 1:scancount(tic)
    rt = scantime(tic, i)
    if minretentiontime(ld) ≤ rt ≤ maxretentiontime(ld)
      ri = retentionindex(rimapper(tic), rt)
      ri_min ≤ ri ≤ ri_max || continue
      push!(ris, ri)
      push!(ints, intensity(tic, i))
    end
  end
  if length(ris) ≥ 2
    c = endswith(metadata(tic)[:sample], "R") ? (:blue, alpha) : (:red, alpha)
    lines!(ax, ris, ints, color=c)
  end
end

save("tics_in_one_axis.svg", f)

This will produce the following SVG file:

Although the TICs for the two treatment groups are plotted in different colors, it can still be difficult to visually discern systematic differences between them. Furthermore, if each treatment group contains more runs, the plot can become cluttered and overloaded. Therefore, it may be helpful to plot the TICs for the two treatment groups on separate, mirrored axes along the x-axis. The following lines of code generate such a figure.

ri_min, ri_max = 900, 1800
alpha = 0.5

f2 = Figure(size=(1200, 600))
ax1 = Axis(f2[1,1], limits = (nothing, nothing, 0, nothing), ylabel="Abundance")
ax2 = Axis(f2[2,1], limits = (nothing, nothing, 0, nothing),
  xlabel="Kovats retention index", ylabel="Abundance", yreversed=true)

linkxaxes!(ax1, ax2)
rowgap!(f2.layout, 0)
hidexdecorations!(ax1, grid=false)

for chrom in chroms
  tic = totalionchromatogram(chrom)
  ris, ints = [], []
  for i in 1:scancount(tic)
    rt = scantime(tic, i)
    if minretentiontime(rimapper(tic)) ≤ rt ≤ maxretentiontime(rimapper(tic))
      ri = retentionindex(rimapper(tic), rt)
      ri_min ≤ ri ≤ ri_max || continue
      push!(ris, ri)
      push!(ints, intensity(tic, i))
    end
  end
  if length(ris) ≥ 2
    if endswith(metadata(chrom)[:sample], "R")
      lines!(ax1, ris, ints, color=(:blue, alpha))
    else
      lines!(ax2, ris, ints, color=(:red, alpha))
    end
  end
end

save("tics_in_two_axes.svg", f2)

This will produce the following SVG file: