SENAITE LIMS has now been installed at the hospital laboratory I am collaborating with for some time, but unexpected disruption and bad luck has slowed momentum. Disruption has happened at both ends: at the lab there has been loss of funding, loss of staff, renovation work, and even a short border war, while at my end there has been too much other competing work, including endless meetings and a lengthy interruption when I was working on the UK hantavirus response.
The main outstanding work relates to Analysis Specifications, which is the SENAITE term for the normal reference ranges for quantitative test results. SENAITE uses these ranges to flag abnormal results both in the web interface and in printed reports.
How do we know if a blood test is normal?
If you have a blood test that measures something, clinicians need to know the limits of normality for that measurement in order to interpret whether it is normal or abnormal. The pattern of abnormality across one or more tests is useful both for diagnosis and for monitoring treatment. Most commonly a minimum and a maximum are set for each test value; these can vary slightly across laboratory machines and testing environments. For example, if your lab’s normal range for sodium concentration in the blood is 135 to 150 mmol/L, then a measurement below 135 is abnormally low and a measurement above 150 is abnormally high. In some cases only the minimum or the maximum matters. For those tests clinicians are only concerned if the measurement is above or below a particular value - anything else is fine.
Interpretations other than normality/abnormality
There are use cases where ranges are defined to distinguish between things other than normality and abnormality. For example, the concentration of protein in a collection of fluid around the lung can help narrow down the list of causes. This does not fit well with Analysis Specifications, which are only designed to distinguish between two scenarios (normal and abnormal), but there are other ways of managing this in SENAITE.
Initially I assumed I would need to configure reference ranges for each panel, but I eventually realised this would be tedious both for me and for lab users. One issue with the way SENAITE handles reference ranges is that the user has to specify them for every sample that is registered. You cannot set a default, and you cannot associate a range with a panel (an Analysis Template) in a way that lets it be autocompleted.1
Some colleagues then made a fortuitous discovery: you can set up universal reference ranges that include all quantitative tests (for a particular sample type), and these work even when only some of the tests are being done. Since then my focus has been on setting up sample type-specific universal reference ranges for population groups, such as adult males, adult females, neonates, and so on. This should simplify things for users registering samples, as there will be far fewer options to select from, and in most cases they will just pick one of the top few.
To ease the consensus-building process (remember that any information system work may need some preceding formalisation of processes), I have been collating draft Analysis Specifications in a shared Google Drive spreadsheet. In earlier phases of setting up this system I had to contend with conflicting versions of metadata shared by various collaborators, changing some things and then being asked to change them back, sometimes more than once, so working to a single source of truth can be very helpful.
My first version of the spreadsheet used the following categories (per sample type), based on my (nearly 30-year-old) clinical knowledge of how normal ranges vary by age and sex, plus some reading of lab documentation:
- Male 18+
- Female 18+
- Pregnant
- Female 13 to 17y
- Male 13 to 17y
- Child 1 to 12y
- Infant 1 to 12m
- Neonate 0 to 28d
This will probably be simplified further in the final version. For each category I have entered minimum and maximum values for every test, using 0 for the minimum where only a maximum is really meaningful. For many of the categories the normal ranges are the same, so it was helpful to use formulae in the spreadsheet rather than re-entering identical values across each column.
Analysis Specifications are easy to set up in the main web interface, but it is a lot of repetitive work when you have many to add manually, so I have been working on a more automated approach. You can send a carefully crafted JSON payload to the right endpoint of the SENAITE API to create any metadata you need (if you can work out how to do it…).
I started out with a collection of scripts I had developed myself, but with the help of GitHub Copilot I compiled these into an R package called sen8r: prcleary/sen8r: Convenient R interface for interacting with the SENAITE LIMS API.
This is a work in progress and I have plenty of ideas for developing it further, but it can already do some useful things. The README covers installation and initial setup. For now it assumes you are only working with a single instance of SENAITE, though I plan to extend that.
So let’s add an Analysis Specification. We can easily delete it later. You should do the following in RStudio.
First install and load the remotes package if you don’t have it already.
install.packages('remotes) # not needed if you have already installed this package
library(remotes)Then install and load sen8r.
remotes::install_github("prcleary/sen8r")
library(sen8r)Then set up sen8r
senaite_setup()This will ask you a series of questions: user name, password and URL of SENAITE instance - have these ready. Afterwards you can check it has worked by running one of the query examples on the README page.
How do we know what to upload to create an analysis specification in SENAITE? It is not fully documented in the API documentation: CRUD — senaite.jsonapi 2.6.0 documentation but with some detective work and experimentation you can work it out.
Let’s look at the fields that an Analysis Specification has in SENAITE:
senaite_lookup('analysisspec')This returns:
ℹ Downloading: https://senaite.nih.org.pk/aims-ajk/@@API/senaite/v1/analysisspec?children=TRUE&complete=TRUE&limit=100 ℹ Lookup fields: uid, children_count, creation_date, id, ResultsRange.hidemin, ResultsRange.uid, ResultsRange.rangecomment, ResultsRange.min_operator, ResultsRange.max, ResultsRange.min, ResultsRange.hidemax, ResultsRange.max_operator, ResultsRange.error, ResultsRange.keyword, ResultsRange.warn_max, ResultsRange.warn_min, api_url, title, parent_path, parent_id, parent_url, review_state, description, portal_type, SampleType.url, SampleType.uid, SampleType.api_url, language, allowedRolesAndUsers, path, parent_uid, getClientUID, creators, modification_date, effective, created, url, author, modified, sortable_title
This is “flattened”, but we can discern the structure of the JSON.
uidchildren_countcreation_dateidResultsRangehideminuidrangecommentmin_operatormaxminhidemaxmax_operatorerrorkeywordwarn_maxwarn_min
api_urltitleparent_pathparent_idparent_urlreview_statedescriptionportal_typeSampleTypeurluidapi_url
languageallowedRolesAndUserspathparent_uidgetClientUIDcreatorsmodification_dateeffectivecreatedurlauthormodifiedsortable_title
Now look at the page for creating an Analysis Specification manually in the Web interface. This specifies the following mandatory fields:
- Sample Type
- Title
plus a lot of other fields that are important but not mandatory. Experimentation shows that you can manually create a (completely useless) Analysis Specification with only a sample type and a title. So these seem to be the minimal fields to specify.
According to the API documentation, you will generally also need to add a couple of other fields: a portal_type and a parent_path. As I understand it, from reading about Plone CMS, this is to tell SENAITE what type of content you want to create and where you want to put it. Let’s look at some existing Analysis Specifications to see what to put there:
as <- get_senaite_data('analysisspec', params = list(complete = TRUE, children = TRUE))
View(as)From that I can see that all existing Analysis Specifications have a portal type of “AnalysisSpec” and parent path “/aims-ajk/bika_setup/bika_analysisspecs”. So let’s see if we can now create an Analysis Specification via the API.
sen8r expects you to create the payload for the endpoint as a (possibly nested) R list. I will create more convenience functions once I understand the SENAITE content model a bit better.
Let’s see if this works:
body <- list(
portal_type = "AnalysisSpec",
parent_path = "/aims-ajk/bika_setup/bika_analysisspecs",
title = "New test analysis specification",
SampleType = 'Serum'
)
post_senaite_data(body)This seems to work…
ℹ Upload successful
… and it has created an Analysis Specification if you look at the Web interface, but instead of the title you see a SENAITE URL, so something has gone wrong.
Then I looked again at the fields in existing Analysis Specifications and noticed that sample type title is not there - it has other things under SampleType: url, uid and api_url, and I also noticed in the API documentation an example where sample type uid was being used.
It would be a pain to have to look up unique identifiers (UIDs), so I created a senaite_lookup function that I think is quite nifty. If you give it an API endpoint it returns a lookup function (a “closure” strictly speaking - a function that encapsulates some data) for that endpoint, which you can use to look up unique identifiers and other things. It will only query the API endpoint once at the time of creation.
The following code queries the sampletype endpoint and creates a sample type lookup function - if we want to lookup the uid for a particular sample type titled “Serum” we would use:
sampletype_lookup <- senaite_lookup('sampletype')
serum_uid <- sampletype_lookup("Serum", "title", "uid")This reads as: using information from the sampletype endpoint tell me what the uid is for a record with the title “Serum”.
Let’s see if this works:
# Use the sample type UID in the JSON payload
body <- list(
portal_type = "AnalysisSpec",
parent_path = "/aims-ajk/bika_setup/bika_analysisspecs",
title = "New test analysis specification",
SampleType = serum_uid
)
post_senaite_data(body)This seems to work…
ℹ Upload successful
… and there is now a completely useless Analysis Specification called “New test analysis specification” visible in the SENAITE Web interface! We are getting somewhere. To be honest it wasn’t quite the neat linear experimental path I have presented above (there was much more trial and error) but I got there in the end.
I next worked out how to upload a more useful Analysis Specification that did actually contain reference ranges - this is the example from the README page:
body <- list(
portal_type = "AnalysisSpec",
parent_path = "/aims-ajk/bika_setup/bika_analysisspecs",
title = "New test analysis specification",
SampleType = serum_uid,
ResultsRange = list(
list(keyword = "serum_sodium", min = 10, max = 20),
list(keyword = "serum_chloride", min = 20, max = 30)
)
)
post_senaite_data(body)It worked. I had worked out that I needed to use keywords (these are something you create when you set up tests in SENAITE - they are basically the real name of the test) to identify the associated tests, again by looking at existing Analysis Specifications. If I had not used fairly obvious keywords I could have used a lookup function to get them, e.g.:
analysisservice_lookup <- senaite_lookup('analysisservice')
analysisservice_lookup('Serum Sodium', 'title', 'Keyword')[1] "serum_sodium"
Now I needed a list of all the tests associated with each sample type. If you look at my SENAITE data diagram you may notice that tests are not uniquely associated with specific sample types. They are indirectly and non-uniquely associated with sample types via Analysis Templates (panels of tests). Let’s get all our Analysis Templates with their tests and associated sample type:
at <- get_senaite_data(endpoint = 'artemplate',
params = list(children = TRUE, complete = TRUE))Info
There is a bit of confusing terminology for panels of tests in SENAITE: they are called either “Analysis Templates” and “Sample Templates” in the Web interface, and
artemplate(Analysis Request Template) in the API. But they all represent the same thing.
I now have a nested R list of all the Analysis Templates information. I can extract the titles easily enough:
(title <- sapply(at, '[[', 'title')) [1] "HIV"
[2] "Serum Albumin and Total Protein"
[3] "CRP"
[4] "Bone Marrow Aspiration"
[5] "Pleural Fluid Analysis"
[6] "CBC and Differential Leucocyte Count"
[7] "Onco 1"
[8] "Chemo 1"
[9] "Dengue Serology"
[10] "Hepatitis Screening"
[11] "Anti Nuclear Antibody (ANA)"
[12] "Brucella Serology"
[13] "CSF Analysis"
... etc
And an even more arcane incantation in R will give me the sample type UIDs:
sampletypeuid <- sapply(lapply(at, '[[', 'SampleType'), `[[`, 'uid')
sampletypeuid[sapply(sampletypeuid, is.null)] <- NA
(sampletypeuid <- unlist(sampletypeuid))[1] "92ae6a816a0a4941b00f0598f6cdc4f9" "92ae6a816a0a4941b00f0598f6cdc4f9"
[3] "92ae6a816a0a4941b00f0598f6cdc4f9" "5b0afa5334db404bbd0b29ef19f02710"
[5] "0ca088e4ff2f4ae98a16bdafc31f141f" "192178b41f464161a29932eb55376cd5"
[7] "92ae6a816a0a4941b00f0598f6cdc4f9" "92ae6a816a0a4941b00f0598f6cdc4f9"
[9] "92ae6a816a0a4941b00f0598f6cdc4f9" "192178b41f464161a29932eb55376cd5"
... etc
Note that some sample type unique identifiers are missing (NULL) and need to be replaced by NA before the list is converted to a vector with unlist, as otherwise those values are dropped.
Some Analysis Templates have been inactivated (switched off in the SENAITE Web interface) so I need the review_state:
(review_state <- sapply(at, '[[', 'review_state')) [1] "active" "active" "active" "active" "active" "active" "active"
[8] "active" "active" "active" "active" "active" "active" "active"
[15] "inactive" "active" "inactive" "active" "active" "active" "active"
... etc
Finally I need the individual IDs for each test in each panel (returns a nested list with one list of Analysis Service (test) UIDs for each Analysis Template):
(service_uid_list <- lapply(lapply(at, `[[`, 'Analyses'), \(. ) lapply(., \(.) .$service_uid)))[[1]]
[[1]][[1]]
[1] "7dd8cc5bda1e4d5fa31ad0fbea8180ff"
[[2]]
[[2]][[1]]
[1] "12edc460ce8b469d9d32ab01560e41d6"
[[2]][[2]]
[1] "60d5f65ccc7343cbad7e9ed3b6f84ccb"
[[3]]
[[3]][[1]]
[1] "5f78623830c141b590a7157069db70e5"
[[4]]
[[4]][[1]]
[1] "152da4c97a9d4040a308737129d2bd34"
[[4]][[2]]
[1] "19f69b3150044afaac288a67bea19615"
[[4]][[3]]
[1] "c5b2306f897745e8a9b83ff75ad6289c"
... etc
As you can see, I still handle lists in a pre-tidyverse way, nesting various tricks to get the bits I want. When what you want is specific elements of a list within another list within another list, you can use a combination of loops with lapply (and/or sapply - same as lapply in that it applies a function to each element of a list, but different from lapply in that it tries to simply what it returns). Here is the code again:
service_uid_list <-
lapply( # loop over Analysis Specifications
lapply(at, `[[`, 'Analyses'), # loop over each list of Analysis Services
\(. ) lapply(., \(.) .$service_uid) # loop over each Analysis Service to get the UID
)
)The \(.) # do something to . syntax is the shorthand R way of creating an anonymous function, which is a function that doesn’t need a name because it is embedded within some other code.
The other trick of using [[ as a function might make more sense if you look at the example below:
x <- list(a = 1, b = 2, c = 3)
# The next two lines are equivalent
x[['b']] # The more familiar form - returns 2
`[[`(x, 'b') # But this is actually what R does
# [[ is a function, but you have to wrap it in backticks to avoid a syntax error, because it breaks R's rules for naming thingsI now have 3 vectors and one list of vectors of varying length. I can combine these using:
library(data.table)
x <- lengths(service_uid_list)
at_dt <- data.table(
title = rep(title, x),
sampletypeuid = rep(sampletypeuid, x),
review_state = rep(review_state, x),
service_uid = unlist(service_uid_list, use.names = FALSE)
)This gives me a neat table. I can add fields for UIDs and keywords using lookup functions:
sampletype_lookup <- senaite_lookup('sampletype')
analysisservice_lookup <- senaite_lookup('analysisservice')
at_dt[, sampletype := sapply(sampletypeuid, \(.) sampletype_lookup(., 'uid', 'title'))]
at_dt[, analysisservice := sapply(service_uid, \(.) analysisservice_lookup(., 'uid', 'title'))]
at_dt[, keyword := sapply(service_uid, \(.) analysisservice_lookup(., 'uid', 'Keyword'))]Because my lookup functions are not “vectorised” (i.e. they expect to be given one value only), I have to use sapply here (which also works on vectors).
I now have a table of all my panels with their associated tests and sample types. Now I can drop any inactive panels and just keep unique combinations of sample type and test:
as_to_create <- unique(at_dt[, .(sampletype, sampletypeuid, analysisservice, keyword)])[order(sampletype, keyword)]
as_to_create[, unique(sampletype)]From this I can see that our SENAITE instance has 10 unique sample types.
I now want to read in the values collated in the Google spreadsheet. I download that as Excel and read it into R with:
library(readxl)
values <- read_excel(
'SENAITE Analysis Specifications.xlsx',
skip = 2,
col_names = c(
"analysis_category",
"analysis_service",
"male18plus_min",
"male18plus_max",
"female18plusnotpregnant_min",
"female18plusnotpregnant_max",
"pregnant_min",
"pregnant_max",
"female13to17y_min",
"female13to17y_max",
"male13to17y_min",
"male13to17y_max",
"child1to12y_min",
"child1to12y_max",
"infant1to12m_min",
"infant1to12m_max",
"neonate0to28d_min",
"neonate0to28d_max"
)
)
setDT(values)I only want to keep the rows of this that relate to an Analysis Service and then I want to join that to my table of panels:
values <- values[!is.na(analysisservice)]
as_data <- merge(as_to_create,
values,
by = "analysisservice",
all.x = TRUE,
sort = FALSE)Nota bene
I should have joined on Analysis Service UID here, as Analysis Services can share titles (they are actually differentiated in the system by keywords), but mine are fortunately unique. Next time I create a similar online spreadsheet I will include UIDs, and probably units too.
So now I have one table with all the information needed to create Analysis Specifications via the API. Let’s create a function to structure the JSON payload for one Analysis Specification. This is what I came up with:
as_upload <- function(as_data,
as_title,
as_sampletype,
as_keywordcols) {
if (is.data.table(as_data))
stop('as_data should be a data.table')
sampletypeuid <- as_data[sampletype %in% as_sampletype, unique(sampletypeuid)]
if (length(sampletypeuid) > 1)
stop('Only one sample type UID is required')
rawresults <- as_data[sampletype %in% as_sampletype, c('keyword', as_keywordcols), with = FALSE]
setnames(rawresults, new = c('keyword', 'min', 'max'))
rawresults <- rawresults[!is.na(min) | !is.na(max)]
ResultsRange <- lapply(seq_len(nrow(rawresults)), function(i) {
list(
keyword = rawresults$keyword[i],
min = rawresults$min[i],
max = rawresults$max[i]
)
})
body <- list(
portal_type = "AnalysisSpec",
parent_path = "/aims-ajk/bika_setup/bika_analysisspecs",
title = as_title,
SampleType = sampletypeuid,
ResultsRange = ResultsRange
)
post_senaite_data(body)
}With Analysis Specification data in the required format, this function will create the data structure needed by SENAITE and send it to the SENAITE API. I can now create a new Analysis Specification with e.g.:
as_upload(as_data,
'Serum Male 18+',
'Serum',
c('male18plus_min', 'male18plus_max'))This works perfectly. If values are changed, or new values are added to the spreadsheet, I can quickly replace Analysis Specifications in SENAITE by deleting the old ones (still have to do this manually but it is quick) and adding new ones with this code.
The code is now sufficiently complex that I am feel like I am reaching the limits of my cognitive capacity on this very hot day, and I recognise that there a number of built-in data structure assumptions here, so this isn’t necessarily the last word on this, and I will definitely need to do more testing, but it is something that will save me a lot of time in future. Without the convenient functions of the sen8r package this would have been even more complicated.
Here is the whole script:
library(data.table)
library(readxl)
library(sen8r)
at <- get_senaite_data(endpoint = 'artemplate',
params = list(children = TRUE, complete = TRUE))
(title <- sapply(at, '[[', 'title'))
sampletypeuid <- sapply(lapply(at, '[[', 'SampleType'), `[[`, 'uid')
sampletypeuid[sapply(sampletypeuid, is.null)] <- NA
(sampletypeuid <- unlist(sampletypeuid))
(review_state <- sapply(at, '[[', 'review_state'))
(service_uid_list <- lapply(lapply(at, `[[`, 'Analyses'), \(.) lapply(., \(.) .$service_uid)))
x <- lengths(service_uid_list)
at_dt <- data.table(
title = rep(title, x),
sampletypeuid = rep(sampletypeuid, x),
review_state = rep(review_state, x),
service_uid = unlist(service_uid_list, use.names = FALSE)
)
sampletype_lookup <- senaite_lookup('sampletype')
analysisservice_lookup <- senaite_lookup('analysisservice')
at_dt[, sampletype := sapply(sampletypeuid, \(.) sampletype_lookup(., 'uid', 'title'))]
at_dt[, analysisservice := sapply(service_uid, \(.) analysisservice_lookup(., 'uid', 'title'))]
at_dt[, keyword := sapply(service_uid, \(.) analysisservice_lookup(., 'uid', 'Keyword'))]
at_dt <- at_dt[review_state %in% 'active']
as_to_create <- unique(at_dt[, .(sampletype, sampletypeuid, analysisservice, keyword)])[order(sampletype, keyword)]
as_to_create[, unique(sampletype)]
values <- read_excel(
'SENAITE Analysis Specifications.xlsx',
skip = 2,
col_names = c(
"analysiscategory",
"analysisservice",
"male18plus_min",
"male18plus_max",
"female18plusnotpregnant_min",
"female18plusnotpregnant_max",
"pregnant_min",
"pregnant_max",
"female13to17y_min",
"female13to17y_max",
"male13to17y_min",
"male13to17y_max",
"child1to12y_min",
"child1to12y_max",
"infant1to12m_min",
"infant1to12m_max",
"neonate0to28d_min",
"neonate0to28d_max"
)
)
setDT(values)
values <- values[!is.na(analysisservice)]
as_data <- merge(as_to_create,
values,
by = "analysisservice",
all.x = TRUE,
sort = FALSE)
as_upload <- function(as_data,
as_title,
as_sampletype,
as_keywordcols) {
if (!is.data.table(as_data))
stop('as_data should be a data.table')
sampletypeuid <- as_data[sampletype %in% as_sampletype, unique(sampletypeuid)]
if (length(sampletypeuid) > 1)
stop('Only one sample type UID is required')
rawresults <- as_data[sampletype %in% as_sampletype, c('keyword', as_keywordcols), with = FALSE]
setnames(rawresults, new = c('keyword', 'min', 'max'))
rawresults <- rawresults[!is.na(min) | !is.na(max)]
ResultsRange <- lapply(seq_len(nrow(rawresults)), function(i) {
list(
keyword = rawresults$keyword[i],
min = rawresults$min[i],
max = rawresults$max[i]
)
})
body <- list(
portal_type = "AnalysisSpec",
parent_path = "/aims-ajk/bika_setup/bika_analysisspecs",
title = as_title,
SampleType = sampletypeuid,
ResultsRange = ResultsRange
)
post_senaite_data(body)
}
# as_upload(as_data,
# 'Serum Male 18+',
# 'Serum',
# c('male18plus_min', 'male18plus_max'))
# as_upload(as_data,
# 'Serum Female 18+ not pregnant',
# 'Serum',
# c('female18plusnotpregnant_min', 'female18plusnotpregnant_max'))
# as_upload(as_data,
# 'CSF Male 18+',
# 'CSF',
# c('male18plus_min', 'male18plus_max'))
#
# as_upload(
# as_data,
# 'CSF Female 18+ not pregnant',
# 'CSF',
# c(
# 'female18plusnotpregnant_min',
# 'female18plusnotpregnant_max'
# )
# )
# as_upload(as_data,
# 'Multiple Male 18+',
# 'Multiple',
# c('male18plus_min', 'male18plus_max'))
#
# as_upload(
# as_data,
# 'Multiple Female 18+ not pregnant',
# 'Multiple',
# c(
# 'female18plusnotpregnant_min',
# 'female18plusnotpregnant_max'
# )
# )
# as_upload(as_data,
# 'Pleural fluid Male 18+',
# 'Pleural fluid',
# c('male18plus_min', 'male18plus_max'))
#
# as_upload(
# as_data,
# 'Pleural fluid Female 18+ not pregnant',
# 'Pleural fluid',
# c(
# 'female18plusnotpregnant_min',
# 'female18plusnotpregnant_max'
# )
# )
# as_upload(as_data,
# 'Stool Male 18+',
# 'Stool',
# c('male18plus_min', 'male18plus_max'))
#
# as_upload(
# as_data,
# 'Stool Female 18+ not pregnant',
# 'Stool',
# c(
# 'female18plusnotpregnant_min',
# 'female18plusnotpregnant_max'
# )
# )
#
# as_upload(as_data,
# 'Urine Male 18+',
# 'Urine',
# c('male18plus_min', 'male18plus_max'))
#
# as_upload(
# as_data,
# 'Urine Female 18+ not pregnant',
# 'Urine',
# c(
# 'female18plusnotpregnant_min',
# 'female18plusnotpregnant_max'
# )
# )
#
# as_upload(as_data,
# 'Whole blood Male 18+',
# 'Whole blood',
# c('male18plus_min', 'male18plus_max'))
#
# as_upload(
# as_data,
# 'Whole blood Female 18+ not pregnant',
# 'Whole blood',
# c(
# 'female18plusnotpregnant_min',
# 'female18plusnotpregnant_max'
# )
# )Footnotes
-
I have a longer wishlist for SENAITE that I will add here at some point. ↩