How to Build a Staff Scheduler Application

Background & Context

From open source algorithms such as glpk to commercial ones such as Gurobi, there are many options available to solve optimization problems. At the time of publishing this article, the author found very few sources that provided code examples and corresponding reasoning behind it. One such source was R package ompr authored by Dirk Schumacher. The primary motivation behind this article is to apply Dirk Schumacher’s example to solving staff scheduling problem with focus on the reasoning behind the code.

Business problem

While the real world problems have many complications, the author selects the following business problem that is difficult enough to warrant some thought and not so difficult that one needs to spend a very long time to think through.

Build a staff schedule given a set of staff, skills, shift preferences and a few business rules.

Data

The data used for this application can be created with the following code in R

#generate shift preference data 
shiftPrefRaw <- data.frame(
"StaffId" = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6),
"Shift" = c(1, 2, 3, 1, 2, 3,
1, 2, 3, 1, 2, 3,
1, 2, 3, 1, 2, 3),
"Monday" = c(1,1,0,1,1,0,1,0,1,1,0,0,1,1,1,1,1,1),
"Tuesday" = c(1,1,0,1,1,0,1,0,1,1,0,0,1,1,1,1,1,1),
"Wednesday" = c(1,1,0,1,1,0,1,0,1,1,0,0,1,1,1,1,1,1),
"Thursday" = c(1,1,0,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1),
"Friday" = c(1,1,0,1,0,0,1,0,1,1,1,1,1,1,1,1,1,1),
"Saturday" = c(1,1,0,1,0,0,1,0,1,1,1,1,1,1,1,1,1,1),
"Sunday" = c(1,1,0,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1)
)
#generate skills data
staffSkillRaw <- data.frame(
"staffId" = c(1, 1, 1,
2, 2, 2,
3, 3, 3,
4, 4, 4,
5, 5, 5,
6, 6, 6
),
"skill" = c("skill1", "skill2", "skill3",
"skill1", "skill2", "skill3",
"skill1", "skill2", "skill3",
"skill1", "skill2", "skill3",
"skill1", "skill2", "skill3",
"skill1", "skill2", "skill3"
)
)

Shift preference data generated and its interpretation is as follows.

#shiftPrefRaw 
StaffId Shift Monday Tuesday Wednesday Thursday Friday Saturday Sunday
1 1 1 1 1 1 1 1 1
1 2 1 1 1 1 1 1 1
1 3 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1
2 2 1 1 1 1 0 0 1
2 3 0 0 0 0 0 0 1
3 1 1 1 1 1 1 1 1
3 2 0 0 0 0 0 0 0
3 3 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1
4 2 0 0 0 1 1 1 1
4 3 0 0 0 1 1 1 1
5 1 1 1 1 1 1 1 1
5 2 1 1 1 1 1 1 1
5 3 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1 1
6 2 1 1 1 1 1 1 1
6 3 1 1 1 1 1 1 1
Interpretation

1. StaffId represents the unique identifier of a staff. For example, the above data has 6 staff represented by the numbers 1 to 6. If the reader wishes to have names represented in the final schedule, the author recommends a separate lookup file maintained for that purpose.
2. Shift represents the various shifts. Each shift is mapped to a number. For example, the above data has 3 shifts namely 1, 2 and 3. Similar to the staff ID, the reader could maintain a separate lookup file to give business friendly names such as Morning shift, Overnight shift etc.

3. Monday to Sunday represents the 7 days of the week. The corresponding one-hot encoded values represent the shift preference of the staff. To explain further, 1 means that the staff prefers that shift for the corresponding day.

For example, it is observed that the first three rows correspond to the shift preference of staffId 1. This staff prefers shift 1 or 2 for all the days in the week. However, this staff does not prefer the shift 3 any day.

Skill data generated and its interpretation is as follows.

#staffSkillRaw
staffId skill
1 skill1
1 skill2
1 skill3
2 skill1
2 skill2
2 skill3
3 skill1
3 skill2
3 skill3
4 skill1
4 skill2
4 skill3
5 skill1
5 skill2
5 skill3
6 skill1
6 skill2
6 skill3
Interpretation

1. StaffId represents the unique identifier of a staff and is identical to that given in the shift preference data.

2. skill represents the skill of the corresponding staffId. A skill in a banking industry call center staff could be customer service, collections etc, while that in a retail industry could be customer complaints, product returns etc. It must be noted that each staff could have more than one skill.

Components of building a staff scheduler

Broadly speaking, there are three components in building an optimization model or in this specific example, a staff scheduler. These components are objective, variable and constraint. These components in context of the business problem detailed at the beginning of this article and their corresponding explanation is as follows.

Objective can be considered the purpose of the model. For example, the purpose of this scheduler is to build a scheduler with just the optimal number of staff for each schedule.

Variable is the component that varies and needs to be modeled such that the objective is achieved. These variables could be thought of as various combinations of the staff, days and shifts.

Constraints could be considered as the rules that need to be followed while modeling the variable to achieve the objective.

Source code in R

The following code creates the Multiple Integer Programming model using ompr library in R. Each step of this code is explained in the sections below.

Step 1: Invoke the libraries

Invoke the libraries namely purrr, dplyr, ompr, ompr.roi and ROI.plugin.glpk

#invoke libraries
library(purrr)
library(dplyr)
library(ompr)
library(ompr.roi)
library(ROI.plugin.glpk)
Step 2: Build model

Initially, a blank canvas is set up. Then, one starts to populate it with the required components in a sequential order. The object model in the code refers to this blank canvas. The MIPModel function instantiates the multiple integer programming model from the package ompr.

#build integer programming model
model <- MIPModel() %>%

This process uses the piping constructor %>% to move from one step to another.

Step 3: Add variable

Recall that variables can be thought of as combinations of staff, days and shifts that are available. The add_variable function helps in adding the variable to the model. Although the model is not mentioned explicitly in the function below, it is implicit since a piping %>% constructor is used to move into this step from the model step.

#set the number of rows(staff) and columns(weekday) for the matrix
numOfDays <- 7
numOfShifts <- length(unique(staffShiftPref$Shift))
#roster(i.e. 1) iff staff is rostered for the given day and shift
add_variable(x[i,j,k], i = 1:numOfStaff, j = 1:numOfDays, k = 1:numOfShifts, type = "binary")
numOfStaff <- length(unique(staffShiftPref$StaffId))

The variables are created with x[i,j,k]. Here, i, j and k are numOfStaff, numOfDays and numOfShifts representing the number of staff, number of days and number of shifts available in the data respectively. If one assumes that there are 6 staff and 3 shifts, then there would be 126 [6(staff) * 3(shifts) * 7(days)] variables created. At this stage, if the model is inspected, it will show the following in the console. Recall that the variable values are 1 or 0 and therefore the variable type shows as Binary.

#model
Mixed integer linear optimization problem
Variables:
Continuous: 0
Integer: 0
Binary: 126
No objective function.
Constraints: 0
Step 4: set objective

Recall that objective is the purpose of the model. For this example, the objective is to build a schedule with just enough staff for each schedule. The set_objective function helps in building the objective.

#optimize the number of staff based on availability

set_objective(sum_expr(checkPref(staff = i, day = j, shift = k) * x[i, j, k], i = 1:numOfStaff, j = 1:numOfDays, k = 1:numOfShifts),sense = "max") %>%

If the objective is to optimize the variables x[i, j, k] so as to schedule the staff, the reader might wonder why one is multiplying that with checkPref(staff = i, day = j, shift = k) and what the checkPref() function does. The following explains the reasoning.

If the staff does not particularly express preference of one shift over another, one could simply multiply x[i, j, k] with 1. However, in the real world, the staff are often given an option to express their preferences and an attempt is made to stick to their preference as far as possible. Hence, staff preference needs to be part of the objective function.

The function checkPref() ensures that that if a given staff does not prefer a given shift for a given day, it will will penalize the objective function by multiplying it with -10000. This means that the model will prefer to schedule another staff who prefers the given shift and day. The function checkPref() is built as follows.

#function to check if staff is available for a given day and shift
checkPref <- function(staff, day, shift)
{
staffShiftSubset <- staffShiftPref[staffShiftPref$StaffId == staff & staffShiftPref$Shift == shift,]
staffShiftDaySubset <- staffShiftSubset[which(!names(staffShiftSubset) %in% c("StaffId", "Shift"))]
staffShiftDaySubset <- staffShiftDaySubset[,day]
isAvail <- staffShiftDaySubset
#to ensure that non-preferred shifts are given least preference, place a high penalty. If needed this step could be made a constraint.
isPref <- ifelse(isAvail == 0, -10000, isAvail)
return(isPref)
}

 

In summary, it takes staffShiftPref data and replaces 0 with -10000. The reader could choose any term for penalizing and need not be limited to the random low value -10000.

The optimization model could attempt to schedule (i.e. put 1) any of the staff for each day and shift. However, checkPref() makes the model to consider only those staff who prefer that given shift and day (i.e. where there is 1 in the shiftPrefData). This is because, if it selects someone who does not prefer the given day and shift (i.e. where there is 0 in the shiftPrefData) then the value of the objective function becomes -10000 and thus this works against maximizing the objective function.

Step 5: Add constraints

Recall that constraints could be thought of as rules that need to be followed while building the optmization model. The business rules that are used for the purpose of this article are as follows.
1. Each staff can only work one shift per day.
2. Every shift must have at least one person staffed.
3. Each staff works for a maximum of 5 days a week.
4. Each shift must have at least one person staffed from each of the three skills.

The R code to deploy these constraints uses the function add_constraint() to apply a constraint. Within this function is nested another function sum_expr() that helps in formulating the constraint mathematically. Although this article provides relatively simpler set of constraints, understanding the sum_expr() function will help one to add many more complex constraints.

At the time of publishing this article, the official documentation of the sum_expr() function is very minimal as given below.

#Usage
sum_expr(expr, ...)
Arguments
expr an expression that can be expanded to a sum
... bind variables in expr using dots. See examples.
Value
the expanded sum as an AST

The author offers the following perspective in an attempt to help readers understand the usage of this function. It must be noted that this perspective is more of an opinion and may not reflect the actual functioning of the underlying code.

The function in its nested form i.e. add_constraint(sum_expr()) could be understood as follows.

add_constraint(sum_expr(<set_of_variables>, <aggregate_variable(s)>)<equality/inequality><constant/variable>, <group_by_variables>)

The above code in terms of an SQL sense could be understood roughly as follows.

select sum(<aggregate_variable(s)>)
from <set_of_variables>
where <equality/inequality><constant/variable>
group by <group_by_variables>

This representation could be understood further with the help of an example. The first constraint i.e. each staff can only work one shift per day is represented as follows.

#each staff can work a maximum of 1 shift a day
add_constraint(sum_expr(x[i,j,k], k = 1:numOfShifts) <= 1, i = 1:numOfStaff, j = 1:numOfDays)

The above representation can be understood as follows

1. Consider the variables `x[i,j,k]`. Note that with 6 staff, 7 days and 3 shifts, these variables are 126 in total.

2. Of these variables, for every combination of `i = 1:numOfStaff, j = 1:numOfDays` i.e. for a given staff on a given day, the sum of `k = 1:numOfShifts` must be `<= 1`.

3. To summarize, the constraint states that for any given staff on any given day, the number of shifts that they can be staffed are at best one shift.

4. Additionally, it must be noted that the `<=1` need not be a constant but a variable or an expression. This means that one could have another `sum_expr()` function in place of the `1`.

The understanding so far is expected to help the readers understand the rest of the constraints relatively easily.
5. The next constraint each shift must have at least one person staffed is represented as follows. It means that for any given day and shift, there is 1 or more staff scheduled.

#each shift each day must have at least 1 staff
add_constraint(sum_expr(x[i,j,k], i = 1:numOfStaff) >= 1, j = 1:numOfDays, k = 1:numOfShifts) %>%

6. The next constraint each staff works for a maximum of 5 days a week is represented as follows. It means that for any given staff, the days and shifts they are scheduled is 5 or less.

#each staff can work a maximum of 5 days a week
add_constraint(sum_expr(x[i,j,k], j = 1:numOfDays, k = 1:numOfShifts) <= 5, i = 1:numOfStaff) %>%

The next 3 constraints namely each shift must have at least one person staffed from each of the three skills is represented as follows. The code is similar across the 3 skills and the author explains only one example for brevity.

#ensure that each day has at least 1 staff from skill "a"
add_constraint(sum_expr(checkSkill(staff = i, skill = "a") * x[i,j,k], i = 1:numOfStaff) >= 1, j = 1:numOfDays, k = 1:numOfShifts) %>%

7. The first step in understanding these three constraints is the function checkSkill() which is built as follows.

checkSkill <- function(staff, skill)
{
#if the staff has the skill then, return 1 else 0
skillPresence <- nrow(skillMix[skillMix$staffId == staff & skillMix$skill == skill,])
return(skillPresence)
}

This function takes the staffId and skill as inputs and returns 1 if the given staffId has the given skill otherwise, it returns zero. Since checkSkill(staff = i, skill = “a”) is multiplied with x[i,j,k] , the outcome will be at least 1 only if the staff has the given skill. Thus, for any given day and shift, there will be at least one staff with the given skill. These steps are repeated for each of the skill.

8. The final step is to run the optimization and source the results. This is accomplished with the following code.

#solve integer programming model
result <- solve_model(model, with_ROI(solver = "glpk", verbose = TRUE))

If the optimal solution is not found, the model will throw an error glp_intopt: optimal basis to initial LP relaxation not provided. In such cases, one could change the constraints or add more variables to arrive at an optimal solution. It must be noted that although the author used the glpk solver, there are other solvers available with the package.

In order to use the model results in an interpretable manner, one could use the following steps.

1. Use the function get_solution() to retrieve the staff(i), day(j), shift(k) and rostered(value) data. Note that if the given staff is rostered for the given day and shift, then the rostered data will have 1 otherwise, it will be coded 0.

roster <- result %>% 
get_solution(x[i,j,k])

2. Select relevant data and rename the columns to understandable names.

roster <- roster[,c("i", "j", "k","value")]
#set readable column names
colnames(roster) <- c("staff", "day" , "shift" , "rostered")

3. The roster for each staff can be visualized with the following code.

xtabs(rostered~ day + shift + staff, data = roster)
Concluding Remarks

This article speaks about staff scheduling problem. However, if the reader understands the reasoning behind the code, it is expected to help the reader in solving more complex staff scheduling problems or even other optimization problems.

If you liked this article, you can see a live demo at staff scheduler application.

Like to work with us?

Contact Us for an introductory call to understand how we can help you.

Check Out Our Case Studies