This page is to be used by the members of the Department of Operations at University of Lausanne, for the seminars in Statistics and Applied Probability, organised by Valérie Chavez-Demoulin, Fabien Baeriswyl (from June 2021 onwards) and Linda Mhalla (from June 2021 to July 2022).
In numerical weather prediction, probabilistic forecasts are now standard tools, and their verification is an important task. This question remains particularly challenging when one focuses on extreme events. We will discuss the limits of the classical methodologies, and propose a tool combining the well-known Continuous Ranked Probability Score (CRPS) and extreme value theory to address extreme events verification. This is a joint work with Maxime Taillardat, Raphaël de Fondeville and Philippe Naveau.
We will meet on the 9th of September, from 9:30 to 14:15, to listen to 3 talks by Maximilian Aigner, Ilia Azizi and Fabien Baeriswyl. The program is the following:
Zoom link if needed: click here
Abstracts of the talks can be found below.
We consider modelling and prediction of patient flows in a large emergency department. To handle various difficulties including unknown triage policies we propose a flexible point process modelling framework. Visualisations and recommendations for hospital policy are derived by simulation.
Features with missing instances can negatively impact the performance of machine learning models. Information Extraction (IE) can improve the availability of tabular data by identifying relevant information from unstructured textual descriptions. This project demonstrated the application of IE on descriptions of online real estate listings, whereby the required missing values are retrieved from the text. Inspired by question-answering tasks, the aim was to recover these values by asking a set of questions. We tested two ways to achieve this goal. The first one focuses on a model specific to the language of the description (French) to perform IE, while the second translates the descriptions into English before IE. The project compared the performance of both approaches while delivering insights on how the formulation of the questions can impact the effectiveness of Q&A models. Additional note: The talk discusses a small project that I did for my doctoral course at EPFL. There’s also a supplementary demo which can be found here https://huggingface.co/spaces/unco3892/real_estate_ie.
In this talk, we discuss how to use (and reconciliate) the classical theory of large deviations (originating from works of H. Cramér, F. Lundberg and S.R. Varadhan) to derive expressions usually referred to as precise large deviation results (originating from works of S.V. Nagaev, A.V. Nagaev and T. Mikosch). As an example of this unified approach, we underline how these results were used to establish tail asymptotics of functionals of certain cluster processes. This is a joint work with Valérie Chavez-Demoulin and Olivier Wintenberger.
Note: slides and/or recordings of the talks are available to the Department members here with the password communicated by email.
We provide finite sample results to assess the consistency of Generalized Pareto regression trees, as tools to perform extreme value regression. The results that we provide are obtained from concentration inequalities, and are valid for a finite sample size, taking into account a misspecification bias that arises from the use of a “Peaks over Threshold’’ approach. The properties that we derive also legitimate the pruning strategies (i.e. the model selection rules) used to select a proper tree that achieves compromise between bias and variance. The methodology is illustrated through a simulation study, and a real data application in insurance against natural disasters. joint work with S. Farkas, A. Heranval and O. Lopez.
We consider regular variation for marked point processes with independent heavy-tailed marks and prove a single large point heuristic: the limit measure is concentrated on the cone of point measures with one single point. We then investigate successive hidden regular variation removing the cone of point measures with at most k points, k ≥ 1, and prove a multiple large point phenomenon: the limit measure is concentrated on the cone of point measures with k + 1 points. Finally, we provide an application to risk theory in a reinsurance model where the k largest claims are covered and we study the asymptotic behavior of the residual risk.
In machine learning, the overfitting phenomenon - disbalance between training-time and test-time performance of a model - has always represented one of the biggest challenges to overcome. Therefore, a strong theory on generalization properties of ML models has been built over the past decades. However, researchers mostly focused on single-objective optimization problems, which is in contrast with recent trends, where not just model’s accuracy, but also fairness, robustness, interpretability, sparsity etc. are optimized for. As it turns out, the single-objective theory can be extended to its multi-objective counterpart straightforwardly. Moreover, the generalizations can be used to form multi-objective-specific statements of great meaning. Using simple tools, the theory provides insights into the behaviour of families of parametrized scalarizations of the loss vectors. The generalization statements hold with high probability globally for all the parametrizations at once. As a consequence, strong statements about multi-dimensional empirical risk minimization can be deduced. As a bonus, the provided generalization bounds are proved to be almost tight in some settings. The theory is supported by experiments on the adult dataset, demonstrating the whole Pareto curve behavior as well as empirical tightness of the theoretical bounds.
Deep learning might be considered as yet another machine learning model. However, its superpower of tackling unstructured data and recognizing complex patterns makes it stand out. Deep learning models allow Netflix to recommend us a good movie, Uber to let us know when our pizza will be delivered, and our banks to communicate with us via chatbots. In this rather informal talk, Iegor will gently introduce the key concepts of deep learning and show what’s hidden behind the curtains of such a technique.
The advent of massive amounts of textual data has spurred the development of econometric methodologies to transform qualitative sentiment data into quantitative sentiment variables and use those variables in an econometric analysis of the relationships between sentiment and other variables. This seminar will present this new research field and illustrate possible applications in economics and finance.
Exogenous heterogeneity, for example in the form of instrumental variables, can help us learn a system’s underlying causal structure and predict the outcome of unseen intervention experiments. In this talk, we discuss this idea in a setting in which the causal effect from covariates on the response is sparse and in a setting, where the variables follow a time dependence structure. If time allows, we also briefly discuss what can be done when identifiability conditions are not satisfied.
The framework of regularly varying time series allows us to describe the extremal dependence structure of multivariate (heavy-tailed) time series by means of the spectral tail process, which captures all information about the extremal dependence structure. Under the additional assumption of stationarity, it was shown in Janssen (2019) that the class of spectral tail processes is equal to the class of stochastic processes which are invariant under the so-called RS-transformation. Furthermore, this RS-transformation can be interpreted as a projection of the distribution of a stochastic process into the class of spectral tail processes. We apply this relationship in a statistical context and define a projection based estimator for extremal quantities. This new estimator now ensures that the estimated quantity is in fact derived from a spectral tail process of an underlying stationary time series. By applying and further developing the tail empirical process theory for sliding blocks estimators from Drees & Neblung (2021) we show uniform asymptotic normality of our estimators. Several simulation studies show that the new estimator has a more stable performance than previous ones. This talk is based on Drees et al. (2021).
What does an image tell about the real estate price? Machine learning (ML) is at the heart of many applications today, where different data types (e.g., tabular and images) are combined to leverage maximum predictive power. However, interpretability of these multi-view learning models is often reduced, leading to a trade-off between interpretability and accuracy. One possible application is real estate appraisal, where hard facts from tabular data and soft information from images are combined to train a holistic model. Nevertheless, explainability is unavoidable in such financial high-stake decision scenarios. This talk will give an overview of different multi-view modeling strategies, and it will show how to make black box systems more transparent.
Joint work with Juliette Legrand (LSCE, Rennes University) and Marco Oesting (Siegen University) Machine learning classification methods usually assume that all possible classes are sufficiently present within the training set. Due to their inherent rarities, extreme events are always under-represented and classifiers tailored for predicting extremes need to be carefully designed to handle this under-representation. In this talk, we address the question of how to assess and compare classifiers with respect to their capacity to capture extreme occurrences.This is also related to the topic of scoring rules used in forecasting literature. In this context, we propose and study different risk functions adapted to extremal classifiers. The inferential properties of our empirical risk estimator are derived under the framework of multivariate regular variation and hidden regular variation. As an example, we study in detail the special class of linear classifiers and show that the optimisation of our risk function leads to a consistent solution. A simulation study compares different classifiers and indicates their performance with respect to our risk functions. To conclude, we apply our framework to the analysis of extreme river discharges in the Danube river basin. The application compares different predictive algorithms and test their capacity at forecasting river discharges from other river stations. As a by-product, we identify the explanatory variables that contribute the most to extremal behavior. If time allowed, we will also discuss other climate datasets.
The generalised extreme value (GEV) distribution is a three {parameter} family that describes the asymptotic behaviour of properly renormalised maxima of a sequence of independent and identically distributed random variables. If the shape parameter $\xi$ is zero, the GEV distribution has {unbounded} support, whereas if $\xi$ is positive, the limiting distribution is heavy-tailed with infinite upper endpoint but finite lower endpoint. In practical applications, we assume that the GEV family is a reasonable approximation for the distribution of maxima over blocks, and we fit it accordingly. This implies that GEV properties, such as finite lower endpoint in the case $\xi>0$, are inherited by the finite-sample maxima, which might not have bounded support. This is particularly problematic {when predicting extreme observations based on multiple and interacting covariates}. To tackle this usually overlooked issue, we propose a blended GEV distribution, which {smoothly combines the left tail of a Gumbel distribution (GEV with $\xi=0$) with the right tail of a Fr'echet distribution (GEV with $\xi>0$)} and, therefore, has {unbounded} support. Using a Bayesian framework, we reparametrise the GEV distribution to offer a more natural interpretation of the (possibly covariate-dependent) model parameters. Independent priors over the new location and spread parameters induce a joint prior distribution for the original location and scale parameters. We introduce the concept of property-preserving penalised complexity (P$^3$C) priors and apply it to the shape parameter to preserve first and second moments. We illustrate our methods with an application to $NO_2$ pollution levels in California, which reveals the robustness of the bGEV distribution, as well as the suitability of the new parametrisation and the $P^3$C prior framework.
We investigate the problem of threshold selection in the context of the extreme value regression model pioneered by Davison and Smith (1990), and finding its use in the analysis of factors affecting the likelihood of extremes events. In this regression context, the threshold choice is a non-trivial task and can have important consequences on the final estimates, since it should also depend on the covariates. We propose an efficient solution to automatically estimate these thresholds with the help of conditional splicing distributions, in the idea of the distributional regression machinery (Rigby and Stasinopoulos, 2005). We introduce a weighted likelihood estimator robust to a misspecification of the density of the body of the distribution that additionally accounts for uncertainty stemming from the threshold choice and respect the threshold stability property. The method is latter used in two applications: the estimation of the downside risks of around 10,000 hedge funds characterized by short available historical data, and the characterization of extreme climate events at different weather stations.
In climate sciences or finance, it is common for an extreme event to trigger a sequence of high records in a short period. Furthermore, precipitation measures and stock records are heavy-tailed thus extreme value theory is typically used to plan for societal and economic risk. However, the assessment of time dependence is not systematic, even in the stationary framework. We consider stationary regularly varying time series. First, we review classical methods to address the time dependences of extremes based on the identification of short periods with consecutive exceedances of a high level. In this case, the extremal index gives a summary of the clustering effect. Second, we generalize this notion considering short periods, or blocks, with lp−norm above a high threshold and derive large deviation principles of blocks. Our main goal is to promote the choice p < ∞, rather than the classical one for p = ∞, where the bias is more difficult to control. We show the theory developed can be used to improve inference of functionals acting on extreme blocks. For example, the extremal index has an interpretation in this way. It can also be applied to compute accurate confidence intervals of extreme return levels.
We will present a framework for describing the asymptotic behavior of high-level exceedances for stationary time series and random fields whose finite-dimensional distributions are regularly varying and whose exceedances occur in clusters. The main tools are the theory of point processes and the notion of the so-called tail process. The latter allows one to fully describe the asymptotic distribution of the extremal clusters using the language of standard Palm theory. We will illustrate the general theory on a couple of time series and random field models.
Studying the tail dependence of multivariate extremes is a major challenge in extreme value analysis. Under a regular variation assumption, the dependence structure of the positive extremes is characterized by a measure, the spectral measure, defined on the positive orthant of the unit sphere. This measure gathers information on the localization of large events and has often a sparse support since such events do not simultaneously occur in all directions. However, it is defined via weak convergence which does not provide a natural way to capture this sparsity structure. In this talk, we introduce the notion of sparse regular variation which allows to better learn the tail structure of a random vector X. We use this concept in a statistical framework and provide a procedure which captures clusters of extremal coordinates of X. This approach also includes the identification of a threshold above which the values taken by X are considered as extreme. It leads to an efficient algorithm called MUSCLE which we illustrate on numerical experiments. We end our presentation with an application to extreme variability for wind and financial data. This work is joint with with N. Meyer.