# Build a staff scheduler application

This article provides a perspective for building a simple staff scheduler app as demonstrated here.

Note: The author focuses on the reasoning behind the code and therefore detailed code explanation is out of scope. Readers with working knowledge of R language would benefit the most from this article.

**Table of contents:**

- Motivation
- Business problem
- Data
- Components of building a staff scheduler
- R code
- Concluding remarks

**1. Motivation**

From open source algorithms such as glpk to commercial ones such as Gurobi, there are many options available to solve optimization problems. At the time of publishing this article, the author found very few sources that provided code examples and corresponding reasoning behind it. One such source was R package ompr authored by Dirk Schumacher. The primary motivation behind this article is to apply Dirk Schumacher’s example to solving staff scheduling problem with focus on the reasoning behind the code.

**2. Business problem**

While the real world problems have many complications, the author selects the following business problem that is difficult enough to warrant some thought and not so difficult that one needs to spend a very long time to think through.

Build a staff schedule given a set of staff, skills, shift preferences and a few business rules.

**3. Data**

The data used for this application can be created with the following code in R

```
#generate shift preference data
shiftPrefRaw <- data.frame(
"StaffId" = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6),
"Shift" = c(1, 2, 3, 1, 2, 3,
1, 2, 3, 1, 2, 3,
1, 2, 3, 1, 2, 3),
"Monday" = c(1,1,0,1,1,0,1,0,1,1,0,0,1,1,1,1,1,1),
"Tuesday" = c(1,1,0,1,1,0,1,0,1,1,0,0,1,1,1,1,1,1),
"Wednesday" = c(1,1,0,1,1,0,1,0,1,1,0,0,1,1,1,1,1,1),
"Thursday" = c(1,1,0,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1),
"Friday" = c(1,1,0,1,0,0,1,0,1,1,1,1,1,1,1,1,1,1),
"Saturday" = c(1,1,0,1,0,0,1,0,1,1,1,1,1,1,1,1,1,1),
"Sunday" = c(1,1,0,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1)
)
#generate skills data
staffSkillRaw <- data.frame(
"staffId" = c(1, 1, 1,
2, 2, 2,
3, 3, 3,
4, 4, 4,
5, 5, 5,
6, 6, 6
),
"skill" = c("skill1", "skill2", "skill3",
"skill1", "skill2", "skill3",
"skill1", "skill2", "skill3",
"skill1", "skill2", "skill3",
"skill1", "skill2", "skill3",
"skill1", "skill2", "skill3"
)
)
```

Shift preference data generated and its interpretation is as follows.

```
#shiftPrefRaw
StaffId Shift Monday Tuesday Wednesday Thursday Friday Saturday Sunday
1 1 1 1 1 1 1 1 1
1 2 1 1 1 1 1 1 1
1 3 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1
2 2 1 1 1 1 0 0 1
2 3 0 0 0 0 0 0 1
3 1 1 1 1 1 1 1 1
3 2 0 0 0 0 0 0 0
3 3 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1
4 2 0 0 0 1 1 1 1
4 3 0 0 0 1 1 1 1
5 1 1 1 1 1 1 1 1
5 2 1 1 1 1 1 1 1
5 3 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1 1
6 2 1 1 1 1 1 1 1
6 3 1 1 1 1 1 1 1
```

###### Interpretation:

*StaffId*represents the unique identifier of a staff. For example, the above data has 6 staff represented by the numbers 1 to 6. If the reader wishes to have names represented in the final schedule, the author recommends a separate lookup file maintained for that purpose.*Shift*represents the various shifts. Each shift is mapped to a number. For example, the above data has 3 shifts namely 1, 2 and 3. Similar to the staff ID, the reader could maintain a separate lookup file to give business friendly names such as*Morning shift*,*Overnight shift*etc.*Monday*to*Sunday*represents the 7 days of the week. The corresponding one-hot encoded values represent the shift preference of the staff. To explain further,*1*means that the staff prefers that shift for the corresponding day. For example, it is observed that the first three rows correspond to the shift preference of staffId*1*. This staff prefers shift*1*or*2*for all the days in the week. However, this staff does not prefer the shift*3*any day.

Skill data generated and its interpretation is as follows.

```
#staffSkillRaw
staffId skill
1 skill1
1 skill2
1 skill3
2 skill1
2 skill2
2 skill3
3 skill1
3 skill2
3 skill3
4 skill1
4 skill2
4 skill3
5 skill1
5 skill2
5 skill3
6 skill1
6 skill2
6 skill3
```

###### Interpretation:

*StaffId*represents the unique identifier of a staff and is identical to that given in the shift preference data.*skill*represents the skill of the corresponding staffId. A skill in a banking industry call center staff could be*customer service*,*collections*etc, while that in a retail industry could be*customer complaints*,*product returns*etc. It must be noted that each staff could have more than one skill.

**4. Components of building a staff scheduler**

Broadly speaking, there are three components in building an optimization model or in this specific example, a staff scheduler. These components are *objective*, *variable* and *constraint*. These components in context of the business problem and their corresponding explanation is as follows.

Objective | Variables | Constraints |
---|---|---|

Build a staff schedule | given a set of staff, skills, shift preferences | and a few business rules |

*objective*can be considered the purpose of the model. For example, the purpose of this scheduler is to build a schedule with just the optimal number of staff for each schedule.*variable*is the component that varies and needs to be modeled such that the objective is achieved. These variables could be thought of as various combinations of the staff, days and shifts.*constraints*could be considered as the rules that need to be followed while modeling the variable to achieve the objective.

**5. R code**

The following code creates the Multiple Integer Programming model using ompr library in R. Each step of this code is explained in the sections below.

```
#invoke libraries
library(purrr)
library(dplyr)
library(ompr)
library(ompr.roi)
library(ROI.plugin.glpk)
#function to compute the skill. This is used for the constraint
checkSkill <- function(staff, skill)
{
#if the staff has the skill then, return 1 else 0
skillPresence <- nrow(skillMix[skillMix$staffId == staff & skillMix$skill == skill,])
return(skillPresence)
}
#function to check if staff is available for a given day and shift
checkPref <- function(staff, day, shift)
{
staffShiftSubset <- staffShiftPref[staffShiftPref$StaffId == staff & staffShiftPref$Shift == shift,]
staffShiftDaySubset <- staffShiftSubset[which(!names(staffShiftSubset) %in% c("StaffId", "Shift"))]
staffShiftDaySubset <- staffShiftDaySubset[,day]
isAvail <- staffShiftDaySubset
#to ensure that non-preferred shifts are given least preference, place a high penalty. If needed this step could be made a constraint.
isPref <- ifelse(isAvail == 0, -10000, isAvail)
return(isPref)
}
#set the number of rows(staff) and columns(weekday) for the matrix
numOfStaff <- length(unique(staffShiftPref$StaffId))
numOfDays <- 7
numOfShifts <- length(unique(staffShiftPref$Shift))
#build integer programming model
model <- MIPModel() %>%
#roster(i.e. 1) iff staff is rostered for the given day and shift
add_variable(x[i,j,k], i = 1:numOfStaff, j = 1:numOfDays, k = 1:numOfShifts, type = "binary") %>%
#optimize the number of staff based on availability
set_objective(sum_expr(checkPref(staff = i, day = j, shift = k) * x[i, j, k], i = 1:numOfStaff, j = 1:numOfDays, k = 1:numOfShifts),sense = "max") %>%
#each staff can work a maximum of 1 shift a day
add_constraint(sum_expr(x[i,j,k], k = 1:numOfShifts) <= 1, i = 1:numOfStaff, j = 1:numOfDays) %>%
#each shift each day must have at least 1 staff
add_constraint(sum_expr(x[i,j,k], i = 1:numOfStaff) >= 1, j = 1:numOfDays, k = 1:numOfShifts) %>%
#each staff can work a maximum of 5 days a week
add_constraint(sum_expr(x[i,j,k], j = 1:numOfDays, k = 1:numOfShifts) <= 5, i = 1:numOfStaff) %>%
#ensure that each day has at least 1 staff from skill "a"
add_constraint(sum_expr(checkSkill(staff = i, skill = "a") * x[i,j,k], i = 1:numOfStaff) >= 1, j = 1:numOfDays, k = 1:numOfShifts) %>%
#ensure that each day has at least 1 staff from skill "b"
add_constraint(sum_expr(checkSkill(staff = i, skill = "b") * x[i,j,k], i = 1:numOfStaff) >= 1, j = 1:numOfDays, k = 1:numOfShifts) %>%
#ensure that each day has at least 1 staff from skill "c"
add_constraint(sum_expr(checkSkill(staff = i, skill = "c") * x[i,j,k], i = 1:numOfStaff) >= 1, j = 1:numOfDays, k = 1:numOfShifts)
#inspect model
model
#solve integer programming model
result <- solve_model(model, with_ROI(solver = "glpk", verbose = TRUE))
```

**Step 1: Invoke the libraries**

Invoke the libraries namely `purrr`

, `dplyr`

, `ompr`

, `ompr.roi`

and `ROI.plugin.glpk`

**Step 2: Build model**

Initially, a blank canvas is set up. Then, one starts to populate it with the required components in a sequential order. The object `model`

in the code refers to this blank canvas. The `MIPModel`

function instantiates the multiple integer programming model from the package `ompr`

.

```
#build integer programming model
model <- MIPModel() %>%
```

This process uses the piping constructor `%>%`

to move from one step to another.

**Step 2: add variable**

Recall that variables can be thought of as combinations of staff, days and shifts that are available. The `add_variable`

function helps in adding the variable to the model. Although the model is not mentioned explicitly in the function below, it is implicit since a piping `%>%`

constructor is used to move into this step from the model step.

```
#set the number of rows(staff) and columns(weekday) for the matrix
numOfStaff <- length(unique(staffShiftPref$StaffId))
numOfDays <- 7
numOfShifts <- length(unique(staffShiftPref$Shift))
#roster(i.e. 1) iff staff is rostered for the given day and shift
add_variable(x[i,j,k], i = 1:numOfStaff, j = 1:numOfDays, k = 1:numOfShifts, type = "binary")
```

The variables are created with `x[i,j,k]`

. Here, `i`

, `j`

and `k`

are `numOfStaff`

, `numOfDays`

and `numOfShifts`

representing the *number of staff*, *number of days* and *number of shifts* available in the data respectively. If one assumes that there are 6 staff and 3 shifts, then there would be 126 [6(staff) * 3(shifts) * 7(days)] variables created. At this stage, if the `model`

is inspected, it will show the following in the console. Recall that the variable values are `1`

or `0`

and therefore the variable type shows as `Binary`

.

```
#model
Mixed integer linear optimization problem
Variables:
Continuous: 0
Integer: 0
Binary: 126
No objective function.
Constraints: 0
```

**Step 3: set objective**

Recall that objective is the purpose of the model. For this example, the objective is to build a schedule with just enough staff for each schedule. The `set_objective`

function helps in building the objective.

```
#optimize the number of staff based on availability
set_objective(sum_expr(checkPref(staff = i, day = j, shift = k) * x[i, j, k], i = 1:numOfStaff, j = 1:numOfDays, k = 1:numOfShifts),sense = "max") %>%
```

If the objective is to optimize the variables `x[i, j, k]`

so as to schedule the staff, the reader might wonder why one is multiplying that with `checkPref(staff = i, day = j, shift = k)`

and what the `checkPref()`

function does. The following explains the reasoning.

- If the staff does not particularly express preference of one shift over another, one could simply multiply
`x[i, j, k]`

with`1`

. However, in the real world, the staff are often given an option to express their preferences and an attempt is made to stick to their preference as far as possible. Hence, staff preference needs to be part of the objective function. - The function
`checkPref()`

ensures that that if a given staff does not prefer a given shift for a given day, it will will penalize the objective function by multiplying it with`-10000`

. This means that the model will prefer to schedule another staff who prefers the given shift and day. The function`checkPref()`

is built as follows.`#function to check if staff is available for a given day and shift checkPref <- function(staff, day, shift) { staffShiftSubset <- staffShiftPref[staffShiftPref$StaffId == staff & staffShiftPref$Shift == shift,] staffShiftDaySubset <- staffShiftSubset[which(!names(staffShiftSubset) %in% c("StaffId", "Shift"))] staffShiftDaySubset <- staffShiftDaySubset[,day] isAvail <- staffShiftDaySubset #to ensure that non-preferred shifts are given least preference, place a high penalty. If needed this step could be made a constraint. isPref <- ifelse(isAvail == 0, -10000, isAvail) return(isPref) }`

In summary, it takes

`staffShiftPref`

data and replaces`0`

with`-10000`

. The reader could choose any term for penalizing and need not be limited to the random low value`-10000`

. - The optimization model could attempt to schedule (i.e. put
`1`

) any of the staff for each day and shift. However,`checkPref()`

makes the model to consider only those staff who prefer that given shift and day (i.e where there is`1`

in the`shiftPrefData`

). This is because, if it selects someone who does not prefer the given day and shift (i.e where there is`0`

in the`shiftPrefData`

) then the value of the objective function becomes`-10000`

and thus this works against maximizing the objective function.

**Step 4: add constraints**

Recall that constraints could be thought of as rules that need to be followed while building the optmization model. The business rules that are used for the purpose of this article are as follows.

- Each staff can only work one shift per day.
- Every shift must have at least one person staffed.
- Each staff works for a maximum of 5 days a week.
- Each shift must have at least one person staffed from each of the three skills.

The R code to deploy these constraints uses the function `add_constraint()`

to apply a constraint. Within this function is nested another function `sum_expr()`

that helps in formulating the constraint mathematically. Although this article provides relatively simpler set of constraints, understanding the `sum_expr()`

function will help one to add many more complex constraints.

At the time of publishing this article, the official documentation of the `sum_expr()`

function is very minimal as given below.

```
Usage
sum_expr(expr, ...)
Arguments
expr an expression that can be expanded to a sum
... bind variables in expr using dots. See examples.
Value
the expanded sum as an AST
```

The author offers the following perspective in an attempt to help readers understand the usage of this function. It must be noted that this perspective is more of an opinion and may not reflect the actual functioning of the underlying code.

The function in its nested form i.e

`add_constraint(sum_expr())`

could be understood as follows.`add_constraint(sum_expr(<set_of_variables>, <aggregate_variable(s)>)<equality/inequality><constant/variable>, <group_by_variables>)`

The above code in terms of an SQL sense could be understood

**roughly**as follows.`select sum(<aggregate_variable(s)>) from <set_of_variables> where <equality/inequality><constant/variable> group by <group_by_variables>`

This representation could be understood further with the help of an example. The first constraint i.e.

*each staff can only work one shift per day*is represented as follows.`#each staff can work a maximum of 1 shift a day add_constraint(sum_expr(x[i,j,k], k = 1:numOfShifts) <= 1, i = 1:numOfStaff, j = 1:numOfDays)`

The above representation can be understood as follows

- Consider the variables
`x[i,j,k]`

. Note that with 6 staff, 7 days and 3 shifts, these variables are 126 in total. - Of these variables, for every combination of
`i = 1:numOfStaff, j = 1:numOfDays`

i.e for a given staff on a given day, the sum of`k = 1:numOfShifts`

must be`<= 1`

. - To summarize, the constraint states that for any given staff on any given day, the number of shifts that they can be staffed are at best one shift.
- Additionally, it must be noted that the
`<=1`

need not be a constant but a variable or an expression. This means that one could have another`sum_expr()`

function in place of the`1`

. - The understanding so far is expected to help the readers understand the rest of the constraints relatively easily.

- Consider the variables
The next constraint

*each shift must have at least one person staffed*is represented as follows. It means that for any given day and shift, there is 1 or more staff scheduled.`#each shift each day must have at least 1 staff add_constraint(sum_expr(x[i,j,k], i = 1:numOfStaff) >= 1, j = 1:numOfDays, k = 1:numOfShifts) %>%`

The next constraint

*each staff works for a maximum of 5 days a week*is represented as follows. It means that for any given staff, the days and shifts they are scheduled is 5 or less.`#each staff can work a maximum of 5 days a week add_constraint(sum_expr(x[i,j,k], j = 1:numOfDays, k = 1:numOfShifts) <= 5, i = 1:numOfStaff) %>%`

The next 3 constraints namely

*each shift must have at least one person staffed from each of the three skills*is represented as follows. The code is similar across the 3 skills and the author explains only one example for brevity.`#ensure that each day has at least 1 staff from skill "a" add_constraint(sum_expr(checkSkill(staff = i, skill = "a") * x[i,j,k], i = 1:numOfStaff) >= 1, j = 1:numOfDays, k = 1:numOfShifts) %>%`

The first step in understanding these three constraints is the function `checkSkill()`

which is built as follows.

```
checkSkill <- function(staff, skill)
{
#if the staff has the skill then, return 1 else 0
skillPresence <- nrow(skillMix[skillMix$staffId == staff & skillMix$skill == skill,])
return(skillPresence)
}
```

This function takes the staffId and skill as inputs and returns `1`

if the given staffId has the given skill otherwise, it returns zero. Since `checkSkill(staff = i, skill = "a")`

is multiplied with `x[i,j,k]`

, the outcome will be at least 1 **only** if the staff has the given skill. Thus, for any given day and shift, there will be at least one staff with the given skill. These steps are repeated for each of the skill.

The final step is to run the optimization and source the results. This is accomplished with the following code.

```
#solve integer programming model
result <- solve_model(model, with_ROI(solver = "glpk", verbose = TRUE))
```

If the optimal solution is not found, the model will throw an error *glp_intopt: optimal basis to initial LP relaxation not provided*. In such cases, one could change the constraints or add more variables to arrive at an optimal solution. It must be noted that although the author used the `glpk`

solver, there are other solvers available with the package.

In order to use the model results in an interpretable manner, one could use the following steps.

Use the function

`get_solution()`

to retrieve the*staff(i)*,*day(j)*,*shift(k)*and*rostered(value)*data. Note that if the given staff is rostered for the given day and shift, then the rostered data will have`1`

otherwise, it will be coded`0`

.`roster <- result %>% get_solution(x[i,j,k])`

Select relevant data and rename the columns to understandable names.

`roster <- roster[,c("i", "j", "k","value")] #set readable column names colnames(roster) <- c("staff", "day" , "shift" , "rostered")`

The roster for each staff can be visualized with the following code.

`xtabs(rostered~ day + shift + staff, data = roster)`

**6.Concluding remarks**

This article speaks about staff scheduling problem. However, if the reader understands the reasoning behind the code is expected to help the reader in solving more complex staff scheduling problems or even other optimization problems.

##### Feedback

Should the reader have any feedback about the concepts/code presented in this article, please leave the details in the comments section below and the author will investigate it.

comments powered by Disqus