Libby Daniells

Ambulance Location Models

Libby Daniells — Tue, 28 Apr 2020 09:57:50 +0000

As part of the STOR604 course at STOR-i, lecturers from around the world come to Lancaster to give a Masterclass in their area of research. In this blog post I’ll be discussing the use of Facility Location Models in the public sector which was introduced to me through a Masterclass given by Laura Albert from the University of Wisconsin-Madison. In particular I’ll be focusing on models that look to optimize the distribution of ambulances in order to maximize patient survival while balancing cost and the number of paramedics required.

Generally, the performance measure used for patient survival is the response time threshold (RTT). In most countries, including the US, an RTT of 8 minutes and 59 seconds is used. This time is based on research of cardiac arrest patients that suggest that a wait time of longer than 9 minutes severely diminishes their chance of survival. Therefore, our problem is where to locate $p$ ambulances in order to cover the most calls in under 9 minutes.

In this blog we’ll be focusing on discrete cases in which the ambulances can be located at pre-defined points on a city map which form a set of vertices $W$. Calls will then come in and be located on a separate set of vertices $V$. We’ll discuss two of the more basic models that look at locating ambulances, we will not discuss models for dispatching ambulances but I’ll include references for further reading into this topic at the end of this post.

Maximum Coverage Location Problem

The Maximum Coverage Location Problem (MCLP) is one of the most basic models for locating ambulances on a set of vertices. We define the binary variable $x_j$ equal to 1 if and only if an ambulance is located at vertex $j\in W$. Similarly, the variable $y_i$ is equal 1 if and only if vertex $i\in V$ is covered by at least one ambulance. Let $W_i$ be the set of location sites covering demand point $i$. And finally let $d_i$ be the demand on vertex $i$. The model is then given as follows:

\begin{array}{lll}
\text{Maximize} & \sum_{i\in V} d_iy_i &\\
\text{s.t} & \sum_{j\in W_i} x_j\; \geq\; y_i & (i\in V),\\
& \sum_{j\in W}x_j\;=\;p,&\\
& x_j\;\in\;\{0,1\} &(j\in W),\\
&y_i\;\in\;\{0,1\} & (i\in V).
\end{array}

This may look rather complex for those of you who have not come across a linear programming model before so I will quickly run through line by line what it means. The first line is the objective function that simply states that we wish to maximize the total demand covered. The first constraint states that call $i$ can only be covered if at least one of the potential ambulance location sites that covers $i$ is selected. The next constraint says that we can use only $p$ ambulances (this is included to limit fixed costs associated with locating more and more ambulances). The final two just mean that $x_j$ and $y_i$ are binary.

This model has several associated issues that make it unrealistic. For one, it doesn’t take into account that ambulance response times contain a large amount of uncertainty. This uncertainty can come from traffic delays, the weather or just time of day (Erkut et al. (2009)). It also assumes that the nearest ambulance is always available which may not be the case, instead we need some sort of back-up coverage. This problem stems from the fact that the MCLP only allows a single ambulance to be located at each vertex. Because of these issues several more advanced models have since been proposed that provide a more realistic solution.

The Maximum Expected Coverage Location Problem

The maximum expected coverage location problem is a direct extension of the MCLP that solves the issue of back-up coverage. In this model it allows more than one ambulance at each vertex so we need to adjust the definition of $y_i$. We now define $y_{ik}$ to be equal to 1 if and only if vertex $i\in V$ is covered by at least $k$ ambulances. In this model, each ambulance has an equal probability $q$ of being busy and already out on a call. The model is as follows:

\begin{array}{lll}
\text{Maximize} & \sum_{i\in V}\sum_{k=1}^p d_i(1-q)q^{k-1}y_{ik} &\\
\text{s.t} & \sum_{j\in W_i} x_j\; \geq\; \sum^p_{k=1}y_{ik} & (i\in V),\\
& \sum_{j\in W}^px_j\;=\;p,&\\
& x_j\;\text{integer} &(j\in W),\\
&y_{ik}\;\in\;{0,1} & (i\in V,\;k=1,\ldots,p),\\
\end{array}

In this model the objective function is to maximize the expected coverage while the rest of the constraints remain similar to the MCLP model.

Extensions

Neither the MCLP nor the MEXCLP models are best for modelling the real life scenario of ambulance location, however both have been used to good effect to improve patient survival. There are now more advanced models that more realistically model the situation, these models address the following problems:

More than one vehicle type may attend a call e.g advance life support (ALS) or quick response vehicles (QRVs) that can provide immediate patient care but cannot transport to hospitals and basic life support (BLS). See Mclay (2009).
More than one ambulance may be dispatched on a call.
Travel times are non-deterministic i.e. response times contain some level of uncertainty due to traffic, the weather or time of day.
Each ambulance should respond to roughly the same number of calls to spread the workload.
We also may want to consider models that look at both dispatching and locating ambulances simultaneously. See Ansari et. al (2015).

Thank you very much Laura Albert for introducing me to Public Sector OR, I thoroughly enjoyed your Masterclass!

References & Further Reading

I would highly recommend reading the papers by Brotcorne et. al (2003) and Erkut et. al (2009) as these give a thorough overview of the models I’ve discussed in this post, as well as, further extensions.

Ansari, S., McLay L.A., Mayora, M.E., (2015). A Maximum Expected Covering Problem for District Design. Transportation Science 51(1), 376-390.
Brotcorne, L., Laporte, G., Semet, F. (2003). Ambulance Location and Relocation Models. European Journal of operational research, 147(3), 451-463.
Erkut, E., Ingolfsson, A., Sim, T., Erdogan, G. (2009). Computational Comparison of Five Maximal Covering Models for Locating Ambulances. Geographical Analysis, 41(1), 43-65.
McLay, L.A., (2009). A Maximum Expected Covering Location Model with Two Types of Servers. IIE Transactions 41(8), 730-741.

Working From Home During The Coronavirus Pandemic

Libby Daniells — Fri, 24 Apr 2020 10:56:04 +0000

A few blog posts ago, I wrote about the current statistics and research into the coronavirus pandemic. Lancaster Univeristy closed down it’s academic buildings in mid March and as students, we were advised to work from home. All of this was done for a good reason: to stop the spread of this deadly virus.

It took me a while to get into a routine while working from home and I really struggled for the first few weeks in terms of productivity. There would be some days where I would sit with a paper open on my laptop and not be able to read a single sentence. I was having difficulty focusing when there was so much uncertainty surrounding our current predicament.

I ultimately made the decision to return to my family home in my hometown, Newport Pagnell. This was a game changer for me. Being surrounded by our loved ones is what we need right now. Since returning home I’ve really gotten into a routine. If your reading this and struggling with working from home, my top tip would be to stick to a working day. I usually work from 9 to 5 with a one hour lunch break and several small breaks throughout the day to keep my brain going.

I’m quite lucky to live in a town surrounded by open fields where I can take long walks and not come into contact with a single person. I also have two Tibetan Terriar dogs who are loving us all being home and are getting a lovely walk each day. I definetly think it’s important to take advantage of your daily exercise allowance.

Some photos from my time in lock-down

I also think it’s very important to not be too hard on yourself if you are not working at 100% capacity. I read a really nice quote recently on twitter that I strongly agree with:

You are not working from home; you are at your home during a crisis trying to work.

No one expects you to be as productive as you were in the office and it’s alright if you have a day when you can’t focus. I tend to a have one really productive day followed by a day where I only work for an hour or two and yo-yo back and forth between the two.

As a programme STOR-i have been encouring us to have regular contact with our cohort. Twice a week, on Microsoft Teams the MRes have a conference call where we just catch up with one another. It is a very strange situation not working in an office together and I’m missing the social aspect of STOR-i a lot. These video calls are a great way to stay in touch.

STOR-i have been adapting to the online setting. As of last week we’ve resumed our weekly forums in which PhD students give a short presentation on their current research. These are obviously now online and not followed by the customary tea and buiscuits but it’s nice to return to some sort of schedule and normality.

Over the last few weeks I’ve been in contact with several potential PhD supervisors and am due to submit my preferences on Monday, so it is a rather exciting yet nerve-wracking time for me. I’m also working on computer assignments, these blog posts and a poster presentation. So lots to keep me busy!

I hope everyone is staying safe and at home. If you are struggling with working from home, please be kind to yourself and this will all be over soon enough.

The Travelling Salesman Problem

Libby Daniells — Tue, 21 Apr 2020 08:28:18 +0000

The Travelling Salesman Problem (TSP) is a classic optimization problem within the field of operations research. It was first studied during the 1930s by several applied mathematicians and is one of the most intensively studied problems in OR.

The TSP describes a scenario where a salesman is required to travel between $n$ cities. He wishes to travel to all locations exactly once and he must finish at his starting point. The order in which the cities are visited is not important but he wishes to minimize the distance traveled. This problem can be describes as a network, where the cities are represented by nodes which are connected by edges that carry a weight describing the time or distance it takes to travel between cities.

This problem may sounds simple and using a brute force method in theory it is: calculate the time to transverse all possible routes and select the shortest. However, this is extremely time consuming and as the number of cities grows, brute force quickly becomes an infeasible method. A TSP with just 10 cities has 9! or 362,880 possible routes, far too many for any computer to handle in a reasonable time. The TSP is an NP-hard problem and so there is no polynomial-time algorithm that is known to efficiently solve every travelling salesman problem.

Because of how difficult the problem is to solve optimally we often look to heuristics or approximate methods for the TSP to improve speed in finding the solution and closeness to the optimal solution.

The TSP can be divided into two types: the asymmetric travelling salesman problem (ASTP) where the distance from A to B is different to that from B to A and the symmetric travelling salesman problem (STSP) where the distance from A to B is the same as from B to A. For example, the ASTP may arise in cities such as Lancaster where there are one-way roads that make travelling more varied. In this blog I will be focusing on the STSP and outline two of the most basic heuristic algorithms in which to solve them.

Nearest Neighbor Algorithm

One of the simplest algorithms for approximately solving the STSP is the nearest neighbor method, where the salesman always visits the nearest city. The process is as follows:

Select a starting city.
Find the nearest city to your current one and go there.
If there are still cities not yet visited, repeat step 2. Else, return to the starting city.

Using the nearest neighbor algorithm on the below symmetric travelling salesman problem starting at city A, we would then travel to city B followed by D and C, returning back to A. This gives a total length of 18, which in this case is indeed optimal. However, the nearest neighbor algorithm does not always achieve optimality.

Figure 1

For example if we change the weight slightly:

Figure 2

The solution using the nearest neighbor algorithm starting again at A will result in the Route A -> C -> B -> D -> A, resulting in a route of weight 15. But this is not optimal. If we instead took the route A -> B -> D -> C -> A the weight would be 14, a slight improvement on that obtained by the algorithm. Therefore the algorithm achieved a sub-optimal result.

This algorithm under worst-case performance is $\mathcal{O}(n^2)$, much better than the brute force method (which is $\mathcal{O}(n!)$). It is easy to implement but the greediness of the algorithm does cause it to run quite a high risk of not obtaining the optimal route.

Greedy Approach Algorithm

Before we delve into the next algorithm to tackle the TSP we need the definition of a cycle. A cylce in a network is defined as a closed path between cities in which no city is visited more than once apart from the start and end city. The order of a node or city is the number of edges coming in or out of it.

The greedy algorithm goes as follows:

Sort all of the edges in the network.
Select the shortest edge and add it to our tour if it does not violate any of the following conditions: there are no cycles in our tour with less than $n$ edges or increase the degree of any node (city) to more than 2.
If we have $n$ edges in our tour stop, if not repeat step 2.

Applying this algorithm to the STSP in Figure 1, we begin by sorting the edge lengths:

B <-> D = 2, A <-> B = 4, C <-> D =5, A <-> D =6, A <-> C =7, C <-> B =8

We then add the routes B <-> D, A <-> B and C <-> D to our tour without problem. We cannot add A <-> D to our tour as it would create a cycle between the nodes A, B and D and increase the order of node D to 3. We therefore skip this edge and ultimately add edge A <-> C to the tour. This results in the same solution as obtained by the nearest neighbor algorithm.

If we then apply the method to the STSP given in Figure 2 we obtain the optimal route: A -> B -> C -> D -> A. This is an improvement on what was achieved by the nearest neighbor algorithm.

This algorithm is $\mathcal{O}(n^2\log_2(n))$, higher than that of the nearest neighbor algorithm with only a small improvement in optimality.

References and Further Reading

As said above, these are only two of the most basic algorithms used to obtain an approximate solution to the travelling salesman problem and there are many more sophisticated methods. If you wish to read more about these, I would suggest reading the following two papers:

Nilsson, C., (2003). Heuristics for the Traveling Salesman Problem. Linkoping University.
Abdulkarim, H., Alshammari, I., (2015). Comparison of Algorithms for Solving Traveling Salesman Problem. International Journal of Engineering and Advanced Technology 4(6).

Coronavirus: A summary of some of the findings so far

Libby Daniells — Wed, 15 Apr 2020 09:10:47 +0000

As of the time of writing (14/4/2020) a total of 11,329 people in the UK and 120,863 worldwide are known to have died from the novel Covid-19 virus. This is an unprecedented global pandemic that has reached almost all corners of the globe. This is a new virus having just emerged in Wuhan, China in late 2019, that attacks the lungs causing pneumonia like symptoms. Many epidemiologists are attempting to model the outbreak and the impact of various intervention plans. Every day on Twitter I see new papers and statistics on the pandemic emerge, the aim of this blog post is to summarize some of the papers I’ve read and analyse some of the statistics that have been released.

All statistical models rely on assumptions and because of this they will never be 100% accurate. They make assumptions on how many people will become infected, the number of cases that will require hospitalization and whether this will exceed NHS intensive care capacity. Weiss (2020) suggests the most basic model to use is the SIR model – where the population are split into three subgroups: susceptible people, i.e. those vulnerable to getting the disease; infected people and removed people (those who have recovered and gained immunity or those who have died) – but with an added category for carriers. This extra category is required as a person can carry the disease for up to 14 days without showing any physical symptoms. This means people can pass the disease on to the vulnerable without knowing they were even sick. This is why social distancing measures are deemed so vital in preventing the spread of this virus. In reality much more complex models are required to fully describe the viral outbreak, however, regardless of the complexity it will never fully describe what will happen in real life. Models are used simply as an “informed prediction” based on the data and the population; they are used to make decisions regarding the interventions to be put in place and when they can be lifted.

As this is a new virus, modelling it’s spread is challenging. This is because assumptions need to be made but there is only a small amount of data available. These assumptions are used for key model parameters (Enserink & Kpferschmidt (2020)). One such parameter is the number of new infections caused by one infected person when no intervention measures are put in place, as well as the time frame in which these infections occur. It takes some time and a large quantity of data for these parameters to be accurately estimated.

Coronavirus and the UK: The Numbers

Modelling by epidemiologists in Imperial College London was used to determine the intervention approach the UK government implemented. It was believed that if a hard lock-down was implemented like in China, Italy and Spain then the infection rate would spike once the intervention was lifted (Enserink & Kupferschmidt (2020)). Because of these beliefs, at first less severe social distancing restrictions were implemented to ensure the peak of infections was flattened and the demand for intensive care beds within hospitals did not exceed capacity. However, taking into account new data, a revised model was created and it was decided that more strict lock-down measures were required to save the NHS from being overwhelmed. These new measures were announced on 23rd March 2020. Due to the nature of the disease, it could take up to a month to determine whether these measures are sufficient.

Some of the data used to model the virus outbreak was obtained from the “BBC Pandemic project” (Klepac et. al (2020)) which collected data from over 36,000 UK citizens in order to create age-specific population contact matrices. This data was used to reduce the amount of social contact in order to reduce the spread of the virus. The details of this paper are listed at the bottom of this page and I highly recommend reading it. The study worked through an app downloaded by participants. The app recorded their approximate location hourly for a day, at the end of which the users recorded all social contacts, providing information on each. This study does come with the flaw of self-reporting and the misinformation that comes alongside that, however it does allow for a massive sample size to more accurately portray the UK population.

Below are three graphs that depict: 1. The cumulative number of cases, 2. The cumulative number of deaths and 3. The daily number of deaths, all of which relate to the UK only. According to Roser et. al (2020), under current death rates, it would take 7 days for the total number of confirmed deaths to double. This is one of the worst growth rates in the world, only behind that of the US and Belgium (both of which have a 6 day doubling rate). The first two graphs imply exponential growth, however we do expect this to level off and reach some sort of peak in the coming weeks.

Cumulative number of cases of Covid-19 in the UK. Data sourced from

Cumulative number of deaths from Covid-19 in the UK. Data sourced from

Total number of deaths from Covid-19 in the UK per day. Data sourced from

This data has to be taken with a pinch of salt as it may not all be up to date, for example there is a lag between testing and the results being obtained. To add to this, according to Richarson and Spieglhalter (2020) just over 317,000 have been tested in the UK to date, compared to the 1.3 million tests carried out in Germany, this may mean that a greater number of people have had the virus but have not had it confirmed. There are also the aforementioned carriers who will not yet know they were infected as they do not present with symptoms. So is the increase in growth caused by an increases in cases or an increase in testing? The answer to this is likely both.

It is also thought that the number of deaths is much higher than what is being reported. This is because the numbers released only include those who have died in hospital and have tested positive for Coronavirus, there is often a delay of a few days or more for the death to be recorded as being caused by Covid-19.

Singapore Case-Control Study

In this section of the blog, I’m going to summarize one of many studies currently being carried out around the globe into the Coronavirus pandemic. The study I will focus on was conducted by Sun et. al (2020). It investigated risk factors on the virus using a case-control study in Singapore between 26th January and 16th February 2020. 54 cases of Coronavirus were compared to 734 control cases. The data collected included: demographic, co-morbidity factors, exposure risk, symptoms and vitals (including blood pressure, pulse and temperature). Predictors of the virus were split into four categories:

Exposure Risk
Demographic Variables
Clinical Findings
Clinical Test Results (some patients presented all clinical tests, others just radiological tests)

From this they created four prediction models, whose variables were selected using stepwise AIC to create logistic regression models:

Model 1: all covariates from all 4 categories,
Model 2: demographic variables, clinical findings and all clinical test results,
Model 3: demographic variables, clinical findings and clinical test results excluding radiology,
Model 4: only demographic variables and clinical findings.

From this study, they found that positive cases of Covid-19 were more likely to be older compared to the controls (with a p-value less than 0.0001) but they were not more likely to have any of the co-morbidity factors than any of the controls (this is an unusual finding as the UK government listed a set of conditions such as diabetes, asthma and heart disease make a person more vulnerable to the disease; I would have thought this would have shown up in the co-morbidity results). However, the exposure factor was deemed significant with 59.3% of cases having had contact with someone with the virus or having recently traveled to Wuhan, compared to only 17.2% of the controls. Cases were also deemed more likely to have a fever (p-value of 0.003) and signs of pneumonia through radiology results (present in 42.6% of cases compared to 11.1% of controls).

From Model 1 it was deemed that exposure risk was most significant in resulting in a positive Covid-19 result. In the other 3 models, which exclude exposure, a high temperature is deemed the most relevant clinical finding in predicting a positive result apart from in Model 2 where Gastrointestinal symptoms were deemed marginally more significant.

It was concluded that Model 1, which takes into account all risk factors, performs exceptionally well in predicting a positive Coronavirus status but even with an absence of exposure status the models 2 and 3 performed sufficiently. The evidence did however show a reduce in performance in using model 4, where basic clinical tests such as bloods were not used. For more information on this study I would recommend reading the paper by Sun et. al (2020) listed in the references below.

Concluding Remarks

Although modelling is very useful in analyzing the spread of Coronavirus and decision making for intervention practices, there is a lot that these models will not show us such as: the degree to which the public comply to social distancing intervention measures, the introduction of a vaccine as well as the toss-up between saving the economy and reducing the death rate. As all models will contain some degree of uncertainty, they must be analysed for pitfalls and decisions should not be based solely on their findings.

References & Further Reading

Klepac, P., Kucharski, A., et al. (2020). Contacts in context: large-scale setting-specific social mixing matrices from the BBC Pandemic project. medRxiv
Enserink, M., Kuperschmidt, K., (2020). With COVID-19, modelling takes on life and death importance. Science (New York, N.Y.) 367(6485)
Richardson, S., Spiegelhalter, D., (2020). Coronavirus statistics: what can we trust and what should we ignore? The Observer
Roser, M., Ritchie, H., Ortiz-Ospina, E., (2020). Coronavirus Disease (COVID-2019) – Statistics and Research, Our World In Data
Sun, Y., Vanessa, K., et. al (2020). Epidemiological and Clinical Predictors of COVID-19, Clinical infectious diseases: an official publication of the Infectious Disease Society of America
Weiss, S. (2020). Why modelling can’t tell us when the UK’s lockdown will end, Wired

STOR-i Conference 2020: Alexandre Jacquillat on Airline Operations, Scheduling and Pricing

Libby Daniells — Tue, 31 Mar 2020 08:28:12 +0000

For this weeks blog post I wanted to branch out of the statistics field and into operational research. To do so I am going to focus on the talk given by Alexandre Jacquillat who opened the 2020 STOR-i Annual conference back in early January. For more details on the STOR-i Conference and an in depth look at Tom Flowerdew’s presentation on fraud detection please see my previous blog post.

Alexandre Jacquillat is an assistant professor of OR and Statistics at the MIT Sloan School of Management. His research focuses on applications in transportation systems to promote more efficient scheduling, operations and pricing using predictive and prescriptive analytics. This was the discussion point of his talk given at the 2020 STOR-i Conference, with a particular focus on the airline sector.

Alexandre Jacquillat at the STOR-i Anual Conference 2020

The work Jacquillat is doing is particularly vital as the transportation sector is transforming, with new technologies such as electric cars and ride sharing emerging, as well as, an increase in demand which is limited by capacity. His work aims to meet this rise in demand, whilst offsetting the costs of congestion.

In particular, the airline sector is a rapidly growing industry with a limited infrastructure. Most airlines are currently running at or above capacity which is what causes delays in departures and landings, incurs a cost to the airline and wastes valuable resources. The challenges surrounding airline operations, scheduling and pricing is a very current topic in the OR field, with the ultimate goal of improving efficiency and profitability in the industry.

I’ll be breaking this blog post down into the three sections discussed in Jacquillat’s talk: operations, scheduling and pricing within the airline sector.

Operations

The operations within airports are limited in capacity, as stated above, most airlines are operating at or above capacity which can cause severe delays. In order to balance capacity and demand and reduce delays and holding, Air Traffic Flow and Capacity Management (ATFCM) initiatives are implemented. One such initiative is the ground delay program, in which planes are held at departure airports if there is a delay to reduce cost and environmental impact, compared to waiting in the air for landing.

Jacquillat proposes modelling this as an optimization problem, in which our objective function is the cost of aircraft delay plus passenger delays constraint to: flight operating constraints, airport capacity constraints and passenger accommodation constraints. This aims to balance flight-centric vs. passenger-centric delays. It is important to consider both delay types, as there is not a direct correspondence between flight delays and passenger delays. Many passengers are on multi-flight itineraries where one small flight delay could cause them to miss multiple further flights and cause a large passenger delay.

Scheduling

Airlines are scheduled time slots for departures/arrivals through a request process. Some time slots are more in demand than others and this demand is often far greater than the capacity. Due to this, some airlines are not allocated their requested slot and instead a process needs to be implemented to ensure fairness in which alternate slot they receive. There is then a Slot conference in which allocated slots can be traded and changed.

This forms a complex slot allocation problem in which connection times and regularity of slots needs to be considered; but also, decisions regarding priorities in terms of historic slots (same airline allocated the same slot), new entrants and change-to-historic slots needs to be considered.

Jacquillat again proposed an optimization problem to allocate slots. To do so he suggested an objective function that minimizes the displacement of allocated slots from the airline’s original request. This objective is subject to the constraints:

Slot displacement: The difference between the slot time requested and those allocated.
Flight connections: The time to make a connecting flight is not over or under the allowable threshold i.e. there is enough time to make the connection or not too long to wait between connections.
Runway capacity: the allocation does not exceed the total number of flights that can depart or arrive at an airport during the time slot.
Terminal capacity: the number of people within the terminals waiting to depart or arriving does not exceed the allowed safety limits.

In order to solve this problem, Jacquillat suggested breaking the requests into subsets using an “Adaptive Destroy and Repair” approach which provides a relatively fast and high-quality solution. For more information about this method I would suggest reading the paper: “A Large-Scale Neighborhood Search Approach to Airport Slot Allocation” whose details I’ll leave in the references below.

During the conference, Jacquillat also presented an integrated approach to scheduling and traffic flow management that took into account the slot requests and airport capacity, as well as, passenger and aircraft itineraries. To do such a problem he proposed a two-stage stochastic integer programming model.

Pricing

The next thing to discuss was how to price airline tickets when compared to competitors. In general, the price of a ticket increases closer to the time of the flight, however a lead-in fare (the starting price) is made public. Good flights will move quickly up the price ladder, whereas less popular flights will stay at the lead-in fare for a longer period of time.

Jacquillat suggested a multi-control-group experimental design conducted on a global airline to test a new practice for airline pricing under competition. This is his current work that is in the process of being published.

I thoroughly enjoyed Alexandre Jacquillat’s presentation, which gave an insight into several solutions within a highly relevant application. I look forward to getting more exposure to OR topics and their applications in the future and in particular at the 2021 STOR-i conference. I’ve listed below some references and further reading into the topic. Thanks again Alexandre!

References & Further Reading

Ribeiro, N., Jacquillat, A., Antunes, A., (2019). A Large-Scale Neighborhood Search Approach to Airport Slot Allocation. Transportation Science 53(6).
Ribeiro, N., Jacquillat, A., Antunes, Odoni, A., Pita, J., (2018). An optimization approach for airport slot allocation under IATA guidelines. Transportation Research Part B: Methodological.

Directed Acyclic Graphs (DAGs)

Libby Daniells — Tue, 24 Mar 2020 13:55:06 +0000

In my last two blog posts I focused on how to analyse the results of clinical trials through both Meta Analysis and Simultaneous Inference. Here we’re going to take a step back and look at how we choose a suitable model with relevant variables considered.

Directed Acyclic Graphs (DAGs) are used as a visual representation of associations between variables or factors in models. I first came across them in an Epidemiological context during the MATH464 course on Principles of Epidemiology given by here at ��Ƶ and thought I’d share the basic concepts with you all. Although I’ll discuss them in an epidemiology setting, DAGs can be used in a variety of applications to demonstrate associations and causal effects.

We’ll start with a simple definition of what DAGs are:

Directed – all variables in the graph are connected by arrows.
Acyclic – if we start at a variable X, following the path of the arrows we shouldn’t be able to get back to X.
Graph – we have nodes which represent factors/variables and arrows that represent causal effects of one factor on another.

Another useful definition is that of a path: a path is any consecutive sequence of arrows regardless of their direction. A backdoor path is where we start a path by moving in the wrong direction down an arrow.

The idea of a DAG is best illustrated through an example. The following example was outlined by Williams et. al (2018) in which the factors affecting obesity in children were considered:

This DAG suggests that a low parental education may increase the amount of screen time a child is engaging in, hence reducing their level of physical exercise. This in turn will increase their risk of obesity. Parental education is also a cause of obesity, hence, parental education is a common cause of both increased screen time and obesity. This is what we call a confounder variable which we’ll return to later.

We say that any two variables are d-connected if there is an unblocked path between two variables, this usually implies they are dependent on one another.

Within DAGs we have several types of variables, all of which need to be handled in different ways when considering how to analyse a model:

Collider – a node where two arrows meet.
Confounder – Pearl’s (2009) definition of confounding is the existence of an open backdoor path between two variables X and Y.
Mediator – an intermediate variable that lies on the causal pathway between two variables.

If we extend the previous example to include self-esteem in the model:

In this example, self-esteem is a collider as both obesity and increased screen time reduce self-esteem. Physical exercise is a mediator between screen time and obesity as it lies on the causal pathway. Finally, parental education is a confounder as it both increases screen time and obesity and hence creates a backdoor path between the two.

Now we have constructed a DAG, how do we use this to create a statistical model? We use the following rules to decide which variables to control for. We can control for a variable in several ways including conditioning on a variable by using the variable as a covariate in the regression model, stratifying by the variable or using matching techniques in trial recruitment.

D-seperation Rules (Palmer, 2018):

If no variables are conditioned on, a path is blocked if and only if there is a collider located somewhere on the pathway between exposure and outcome.
Conditioning on a confounder blocks the path.
If we condition on a collider it doesn’t block the path, in fact, it creates a path between exposure and control. This may mask the true relationship between two variables or indicate a relationship when none in fact exists. This is known as collider bias.
Also, a collider that has a descendant that has been conditioned on doesn’t block the path.

A is a collider

Conditioning on a collider opens a pathway between A and C

Conditioning on a descendant of a collider also opens a pathway between A and C

If we control for a confounder we reduce bias but if we adjust for a collider we increase bias. Collider bias is responsible for many cases of bias in modelling and is often not dealt with properly (Barrett, M. (2020)). This is what makes DAGs such a useful tool in modelling. It gives a visual representation of how things are associated with one another and can indicate where bias is being induced in models.

Modelling through DAGs may be easy for simple situations with only a few variables but it gets very complicated very quickly when the number of variables and associations increases.

For further reading, I would recommend the paper by Evandt et. al (2018) in which they use DAGs to model the association between road traffic noise and sleep disturbances by considering variables such as socioeconomic status and lifestyle. However, to see how DAGs are applied outside of an epidemiological setting I would recommend the paper by Al-Hawri et. al (2019), where they use DAGs to model wireless sensor networks.

I hope you enjoyed this blog post on DAGs!

References

Al-Hawri, E., Correia, N., Barradas, A., (2020). DAG-Coder: Directed Acyclic Graph-Based Network Coding for Reliable Wireless Sensor Netowrks. IEEE Access 8.
Barrett, M., (2020). An Introduction to Directed Acyclic Graphs. Cran R Project: https://cran.r-project.org/web/packages/ggdag/vignettes/intro-to-dags.html
Evandt, J., Oftedal, B., Hjertager Krog, N., Nafstad, P., Schwarze, P., Marit Aasvang, G., (2016). A population-based study on nighttime road traffic noise and insomnia. SLEEP, 40(2).
Palmer, T., (2018). Principles of Epidemiology MATH464 Lecture Notes. ��Ƶ.
Pearl, J., (2009). Causality: Models, Reasoning and Inference. Cambridge University Press 2nd Edition.
Sttorp, M., Siegerink, B., Jager, K., Zoccali, C., Deker, F., (2015). Graphical Presentation of Confounding in Directed Acyclic Graphs. Nephrology Dialysis Transplantation, 30(9).
Williams, T., Bach, C., Mattiesen, N., Henriksen, T., Gagliardi, L., (2018). Directed Acyclic Graphs: A Tool for Causal Studies in Pediatrics. Pediatric Research, 84(4).

What is a Meta-Analysis? The benefits and challenges

Libby Daniells — Mon, 09 Mar 2020 19:59:19 +0000

My my last blog post focused on how to analyse a single clinical trial with multiple treatment arms. But what if we want to consider results for multiple trials studying a similar treatment effect?

During my research for my MSci dissertation (see the last blog post to find out the basics of my research) I came across the concept of meta-analysis. The overall motivation for conducting meta-analyses is to draw more reliable and precise conclusions on the effect of a treatment. In this blog post I will outline for you both the benefits and costs of conducting a meta-analysis.

Meta-analysis is a statistical method used to merge findings of single, independent studies that investigate the same or similar clinical problem [Shorten (2013)]. Each of these studies should have been carried out using similar procedures, methods and conditions. Data from the individual trials are collected and we calculate a pooled estimate (although data is not usually pooled!) of the treatment effect to determine efficacy.

Effectiveness vs. Efficacy: ‘Efficacy can be defined as the performance of an intervention under ideal and controlled circumstance, whereas effectiveness refers to its performance under real-world conditions.’ [Singal, A. et al. (2014)].

If conducted correctly, efficacy conclusions from a meta-analysis should be more powerful due to the larger sample size created by considering several studies. Often this sample size is far greater than what we could feasibly achieve in a single clinical trial, which is constraint by funds and resources including the availability of patients. This increased sample size also improves the precision of our estimate in terms of how closely the trial results relate to effectiveness in the whole population [Moore (2012)].

Although meta-analysis can be a useful tool to increase sample size and hence statistical power, it does have significant associated methodological issues. The first of these is publication bias. This may be introduced because trials which show significant results in favor of a new treatment are more likely to be published than those which are inconclusive or favor the standard treatment. Another form of publication bias arises when researchers exclude studies that are not published in English [Walker et al. (2008)]. This exclusion of studies may lead to an over/under-estimate of the true treatment effect.

The issue of publication bias in a meta-analysis exploring the effects of breastfeeding on children’s IQ scores was discussed by Ritchie (2017). A funnel plot of the original data set showed a tendency for larger studies to show a smaller treatment effect, indicating publication bias. The original study found that breastfed children had IQ scores that were on average, 3.44 points higher than non-breastfed children. However, after adjusting for publication bias a much lower estimate of 1.38 points higher IQ was given. Although this is still a significant result, this example highlights the issue of overestimation resulting from publication bias.

Funnel Plots: A funnel plot is a method used to assess the role of publication bias. It is a plot of sample size versus treatment effects. As the sample size increases, the effect size is likely to converge to the true effect size [Lee, (2012)]. We will obviously have a scatter of points surrounding this true effect size, however, if we have publication bias, we may have a lack of ‘small effect size’ small studies included in the meta-analysis. This will lead to the funnel plot being skewed.

Another key issue with meta-analysis is heterogeneity. This is defined as the variation in results between studies included in the analysis. Investigators must consider the source of this inconsistency, they may include differences in trial design, study population or inclusion/exclusion criteria between trials but also differences due to chance. High levels of heterogeneity comprises the justification for meta-analysis, as grouping together studies whose results vary greatly will give a questionable pooled treatment effect, thus reducing our confidence in making recommendations about treatments. There are methods to handle heterogeneity, one of which is to fit a random effects model.

A meta-analysis considering strategies to prevent falls and fractures in hospitals and care homes [Oliver et al. (2007)] obtained strong evidence to suggest heterogeneity between studies. This variation was highlighted in forest plots which showed a very wide spread of results. As the investigators believed all trials were similar enough in design and all aimed to trial the same treatment, they felt it was justified to calculate a pooled treatment effect. However, this high variability brings into question the reliability of the estimate, complicating decisions regarding recommendation of treatments.

In summary, Meta-analysis is a very useful tool for combining results of studies in order to boost the precision of our conclusions. We do however need to proceed with caution and ensure heterogeneity and check for publication bias.

References & Further Reading

Lee, W., Hotopf, M., (2012). Critical appraisal: Reviewing scientific evidence and reading academic papers. Core Psychiatry, 131-142.
Moore, Z. (2012). Meta-analysis in context. Journal of Clinical Nursing, 21(19):2798-2807.
Oliver, D., Connelly, J. B., Victor, C. R., Shaw, F. E., Whitehead, A., Genc, Y., Vanoli, A., Martin, F. C., and Gonsey, M. A. (2007). Strategies to prevent falls and fractures in hospitals and care homes and effect of cognitive impairment: systematic review and meta-analyses. BMJ, 334(7584).
Ritchie, S. J. (2017). Publication bias in a recent meta-analysis on breastfeeding and IQ. Acta Paediatrica, 106(2):345-345.
Singal, A., Higgins, P., Waljee, A. (2014). A primer on effectiveness and efficacy trials. Clinical and Translational Gastroenterology, 5(1).
Shorten, A. (2013). What is meta-analysis? Evidence Based Nursing, 16(1).
Walker, E., Hernandez, A., and Kattan, M. (2008). Meta-analysis: Its strengths and limitations. Cleveland Clinic Journal Of Medicine, 75(6):431-439.

Simultaneous Inference in Clinical Trials

Libby Daniells — Thu, 20 Feb 2020 09:00:00 +0000

As part of my undergraduate study I completed a dissertation titled “Simultaneous Inference in Clinical Trials” supervised by Dr Fang Wan. In particular I focused on the construction of simultaneous confidence intervals for treatment effects. In this post I wanted to share with you a brief overview of some of my findings.

Within clinical trials it is common to test multiple hypotheses simultaneously. This is referred to as multiple comparisons. An example of where this arises is in biomarker trials. Biomarkers are measurable indicators of biological conditions that can be used to identify target populations within trials.

Example – Biomarker trials have the following set up: as patients are enrolled onto the trial their biomarker status is identified; these results are used to stratify individuals into two groups: biomarker positive or biomarker negative. From here, patients are allocated treatments using randomization. In this case we consider a two-treatment scenario in which we are testing a new treatment (T) against a control (C). This control is usually either an existing treatment which we wish to show is inferior or equivalent to the new treatment, or it is a placebo, which is a drug identical to T but lacks the active agent. The stratification process is illustrated in the figure below. This design creates four subgroups within which we simultaneously estimate the size of the treatment effect (i.e. the result of a specific treatment within a subgroup).

We will be focusing on trials with binary endpoints i.e. the treatments were either deemed a success or failure. Therefore, our parameter of interest is the proportion of times the treatment was successful.

Before we delve into the issues associated with multiple comparisons we need definitions of the error rates that will play a significant role in simultaneous testing.

Type I Error: Occurs when we reject a true null hypothesis. The type I error rate is the probability of making a Type I Error. This is equivalent to the significance level which we often set to be $\alpha$=0.05.

Type II Error: Occurs when we accept a false null hypothesis. The type II error rate is the probability of making a Type II error.

Confidence Interval: A confidence interval (CI) is constructed to demonstrate the degree of uncertainty surrounding a parameter estimate. These intervals have confidence level $1-\alpha$ when a large number of intervals are constructed and $100(1-\alpha)$% contain the true parameter value.

Family-Wise Error Rate (FWER): The probability of rejecting at least one true null hypothesis.

When conducting simultaneous hypothesis tests, we are often interested insuring that the significance level for a whole group of hypothesis is $\alpha$, rather than just at the individual test level. To do so we need to control the FWER to be approximately $\alpha$.

If we are testing $k$ independent hypothesis simultaneously, their overall FWER is given by $$\text{FWER}\;=\;1-\mathbb{P}(\text{No true null hypothesis rejected})\;=\;1-(1-\alpha)^k.$$ Therefore, as the number of hypothesis increases, the error rate rapidly tends to 1, meaning the probability of making at least one type I error rises to an unacceptable level as demonstrated by the following figure.

I will now discuss two ways in which to correct for multiple testing.

Bonferroni Correction

When using Bonferroni correction, we set the significance level for an individual hypothesis test to $\alpha/k$ where $k$ is the total number of hypotheses being tested simultaneously. We reject the $i^{th}$ hypothesis when the p-value is less than $\alpha/k$.

With Bonferroni applied, the family-wise error rate is kept equal to, or below $\alpha$. As the FWER can be below the desired significance level, we call it conservative. It becomes increasingly conservative as the number of hypotheses increases (as seen in the figure below).

If the independence assumption for the FWER is not met, Bonferroni could become extremely conservative. A family-wise error rate below 0.05 will lead to a greater number of null hypotheses being accepted despite being false. Thus, we’ve improved the type I error rate at the expense of the type II error rate.

Sidak Correction

Like Bonferroni, Sidak correction involves adjusting the significance level of an individual test in order to control the FWER. This time we set the significance level for an individual hypothesis test to $1-(1-\alpha)^{1/k}$.

When testing several independent hypotheses simultaneously the use of Sidak correction ensures that the FWER is exactly $\alpha$. This is a more powerful method than Bonferroni as it is always less conservative. However, this is reliant on the fact that the hypotheses are independent. If any dependencies do arise Sidak can be overly liberal and produce a FWER greater than $\alpha$, giving an unacceptably high probability of making a type I error.

Because of this reliance on the independence assumption, we tend to prefer the Bonferroni method. To add to this, despite Sidak having greater statistical power, the improvement over Bonferroni is minimal.

Within my dissertation I used these methods and others to construct several different types of simultaneous confidence intervals, determining which was optimal under varying conditions. Although I won’t go into the ins and outs of this in this blog posts, I will share that Bonferrnoi correction with a Wilson Score interval did give optimal coverage and length properties, regardless of the sample size, number of hypotheses being tested and the value of the parameter estimate.

Multi-Armed Bandits: Thompson Sampling

Libby Daniells — Mon, 27 Jan 2020 09:00:00 +0000

For the STOR608 course here at STOR-i we covered several research areas as part of the topic sprints as discussed in my first blog post: “My First Term as a STOR-i MRes Student”. My favorite of these was the sprint lead by David Leslie titled Statistical Learning for Decision. For this I focused my research on the use of Thompson Sampling to tackle the multi-armed bandit problem. This will be the topic of this blog post.

Multi-armed bandits provide a framework for solving sequential decision problems in which an agent learns information regarding some uncertain factor as time progresses to make more informed choices. A common application for this is in slot machines, where a player must select one of K arms to pull.

A K-armed bandit is set up as follows: the player has K arms to choose from (known as actions), each of which yields a reward payout defined by random variables. These rewards are independently and identically distributed (IID) with an unknown mean that the player learns through experimentation.

In order to experiment, the player requires a policy regarding which action to take next. Formally a policy is an algorithm that chooses the next arm to play using information on previous plays and the rewards they obtained (Auer et al., 2002).

When making the decision on which arm to pull next, the policy must weigh up the benefits of exploration vs. exploitation. Exploitation refers to choosing an arm that is known to yield a desirable reward in short-term play. Whereas, exploration is the search for higher payouts in different arms, this can be used to optimize long term reward.

In order to quantify a policy’s success, a common measure known as regret is used. The regret of a policy over $T$ rounds is defined as the difference between the expected reward if the optimal arm is played in all $T$ rounds and the sum of the rewards observed over the $T$ rounds.

Thompson sampling (TS) is one of the policies used for tackling multi-armed bandit problems. For simplicity we will consider a case where the rewards are binary i.e. they take the values 0 (for failure) or 1 (for success), however TS can be extended to rewards with any general distribution.

Consider a multi-armed bandit with $k\in\{1,\ldots,K\}$ arms each with an unknown probability of a successful reward $\theta_k$. Our goal is to maximize the number of successes over a set time period.

We begin by defining a prior belief on our success probabilities which we set to be Beta distributed. Each time we gain an observation we update this prior belief by updating the parameters of the beta distribution and use this updated distribution as a prior for the next arm pull.

The process for deciding which arm to pull at each time step using the Thompson Sampling algorithm is given in the following diagram:

Although here we have specified a Beta distribution, a distribution of any kind can be used. However, in this case it is a convenient choice as it’s conjugate to the Bernoulli distribution and hence the posterior is also Beta distributed. This is why we only need to update the parameters at each time step. If the prior chosen was not conjugate, then it would not be so simple. Instead we may be in a situation where we need to specify a completely new prior each time.

The prior chosen plays a large role in the effectivenss of the Thompson Sampling algorithm, and thus care must be chosen over its specification. Ideally it should be chosen to describe all plausible values of the reward success parameter.

In the slot machine case, the player often has no intuition on the values of this success parameter. This is usually described using an uninformative Beta(1,1) prior as it takes uniform value across the entire [0,1] range. This promotes exploration to learn which arm is optimal.

In the opposite case where we have some knowledge of the true values of the success parameter, we may center the priors about these values in order to reduce regret. Here, we require less exploration to identify the optimal arm and so can exploit this to maximize reward.

Although TS tends to be efficient in minimizing regret, there are occasionally outlier cases where the regret is much higher than expected. This may happen if we begin by pulling the sub-optimal arm and receive a successful reward. We are likely to exploit this further and continue to pull this arm, falsely leading us to believe this arm is actually optimal. Or alternatively, we may begin by selecting the optimal arm and observe a failure reward. We then may exploit the sub-optimal arm under the false belief that it will give us a higher reward.

To conclude, Thompson Sampling is a simple but efficient algorithm for solving stochastic multi-armed bandit problems, successfully balancing exploitation and exploration to maximize reward and minimize regret.

References & Further Reading

Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-time analysis of the mulitarmed bandit problem. Machine Learning, 47(2):235-236.

Russo, D., Van Roy, B., Kazerouni, A., Osband, I., and Wen, Z. (2017). A tutorial on Thompson Sampling.

STOR-i Annual Conference: Fraud Detection Using Machine Learning

Libby Daniells — Mon, 20 Jan 2020 09:00:00 +0000

I recently attended my first STOR-i Conference which took place on the 9-10th January 2020. During the conference we heard from several research leaders on cutting edge statistics and operational research topics, including professors from MIT, University of Oslo, University of Edinburgh, Columbia University, University of Southampton, Brunel University, Copenhagen Business School and University College Dublin. We also heard from current STOR-i students Henry Moss and Georgia Souli, who discussed their research, as well as, STOR-i alumni Tom Flowerdew who currently works for Featurespace and Ciara Pike-Burke who works as a researcher at the Univeritat Pompeu Fabra in Barcelona.

On the Thursday evening there was a poster session where 1st and 2nd year PhD students displayed their work. This was a valuable session as I was able to see the kind of projects they are working on before I choose my PhD topic later this term.

In this blog I will be discussing a talk I particularly enjoyed during the conference. This was STOR-i alum Tom Flowerdew’s presentation on Real-Time Fraud Detection. Tom was part of the second cohort of STOR-i students in 2011 where he completed a project funded by ATASS Sports, supervised by Prof Kevin Glazebrook, Dr Chris Kirkbride and Prof Jon Tawn. He now works at Featurespace, which is a world-leader in Adaptive Behavioural Analytics.

Tom Flowerdew’s presentation at the 9th annual STOR-i Conference.

The talk Tom gave during the conference was regarding how to score transactions on their risk of fraud.

Fraud detection needs to be deployed in real-time, as in we need to determine as the card is used whether it’s a fraudulent purchase or not. The speed of the scoring is therefore of high importance.

Before we go into the techniques in which transactions are scored, we first need a background in how payments actually occur. This is detailed in the diagram below. It begins with a cardholder using their card at a store. The store then passes the card information to an acquirer who then forward these details to card networks such as Visa or MasterCard. The card network then requests authorization from the bank who decide whether or not to approve based on the amount of money in the account.

Payment process. Diagram based off the presentation given by Tom Flowerdew.

If a cardholder believes they have been a victim to fraud they can raise a chargeback complaint to their bank. The bank then passes information back through the acquirer and merchant who decide whether to accept or dispute the fraud claim.

Featurespace try to spot fraud at all stages in the payment cycle using machine learning.

Machine Learning: Machine Learning is where machines can imitate the learning behavior of humans. That is, they can learn based on experiences, observations and analysis of data.

In this fraud detection application, we are interested in using machines to detect fraud and learn from the experience to further detect cases of fraudulent actions.

Tom and Featurespace are tackling this problem through the use of labels. They label cases based on whether or not they were fraudulent. They do so through cardholders going to their bank and claiming a chargeback. The bank then flags the transaction as risky.

These labels tend to be out of date as it may take some time for the cardholder to alert the bank of a case of fraud. For example, some people, particularly the elderly, do their banking through monthly statements rather than through online banking. This will obviously cause a delay in them noticing any odd transactions.

We could set a limit on how long you can wait to label a transaction, say set it to 35 days. But this means we are still 35 days out of date in terms of our data. In this time, fraudsters may change the way they commit their fraud and thus they may still go undetected. This will mean our fraud classifier will have an unpredictable performance. If we lower our limit to 20 days then we are lowering the number of correct labels (as any fraud detected after 20 days will be taken as genuine) and bias the fraud classifier towards detecting fraud for young people.

Another setback is that if a transaction is considered as fraud and then declined by the bank, the transaction won’t get a label and cannot be used in the data set. This will hamper the performance of the classifier as not all data is available.

Another interesting component of fraud detection that Tom mentioned was on the use of online banking. It is possible to compare mouse movement and clicks on an online banking site to the usual account user. If there are differences, it can be flagged up as fraud.

Tom’s talk on fraud detection was my first encounter of machine learning and I have to say it’s a field I found very interesting.

I look forward to attending more conferences in the future. For more information regarding events held by STOR-i including the conference, please visit /maths/about-us/events/.

References

Pachanekar, R., Tahsildar, S. (2019) Quant Insti: Machine Learning Basics ( )