Determining the Validity of Institutional Forecasts

Originally published on November 20, 2015

Economic forecasting has become increasingly publicized recently. You have probably seen the coverage on the evening news; the Federal Reserve “predicts” that the United States unemployment rate will continue to decrease to a natural rate of about 5% next year, the Survey of Professional Forecasters projects that GDP will continue to grow at a rate of 2.8%—up from 2.7%—next quarter, etc. These forecasts can be powerful and are important to policy, business, and other decisions that individuals, corporations, and the government makes. However, this new found acknowledgement has been accompanied by a more intense scrutiny. For example, the Federal Reserve forecasted that GDP would be somewhere between 3.0-3.2% in June of 2014, but in March of 2015 projection were lowered to between 2.3-2.7%. How accurate are the forecasts that are being made and who is consistently making the best forecasts? New research out of The George Washington University is helping to answer these questions.

In a recent paper by Sinclair, Stekler and Müller-Dröge entitled “Evaluating Forecasts of a Vector of Variables: A German Forecasting Competition”, the authors explore methodologies for ranking the accuracy of 25 different organizations based on their overall forecasting performance on eight key economic variables in the German economy. These eight variables include GDP, private consumption, exports, imports, government surplus, consumer prices, the unemployed quota, and gross fixed capital formation which includes the new and current amount of assets that are not easily converted into cash. Each of these variables has an effect on the economy from measuring the aggregate health to observing how prices for goods and services will change and in turn affect consumers. In order to determine an organization’s overall ranking on these variables, the authors use three different types of statistical methods to measure errors which allow them to determine who made the best overall forecast on the basis of minimizing the forecast error. Importantly they also create a vector of forecasts and a vector of outcomes using quantitative techniques in order to overlay variables and make new assessments about their interactions using a new method that has not been used in this way before. The methods that are used are the standard practice of determining the trace of the mean-square error matrix (TMSE), the familiar Euclidean distance, and the less familiar Mahalanobis distance. The focus on a joint evaluation of a vector of variables is a fundamental shift from how forecasting competitions usually judge organizational performance, on a variable-by-variable basis. In using multivariate analysis, the authors demonstrate how allowing for cross-variable dependence, or the effect that variables have on one another, with Mahalanobis distance greatly changes the outcomes of forecasting competitions. The paper is derived from a competition organized by the Handlesblatt newspaper in Germany, which showed based on the analysis of Sinclair, Stekler, and Müller-Dröge, that the Deutsche Bundesbank was the most successful. Professor Sinclair presented this work at a conference in London at the Bank of England, co-sponsored by the Centre for Applied Macroeconomic Analysis (CAMA) at the Australian National University.

This research could be used to determine which institutions have the best overall forecasts for any economy.

Certain things are apparent from looking at the data. Overall preliminary reports closely relate to final updated reports with the exception of the government surplus variable. For example using the TMSE, preliminary data shows up as the most effective method of forecasting performance. ING is ranked second using this method due to their performance forecasting exports and imports. Third was a naïve no change forecast which means that the forecast assumed no changes from the previous year. It is thus interesting to note that while preliminary data comes out at the top of all joint evaluations; this is not always the case for its performance with ranking individual variables. This clearly illustrates that using Mahalanobis distance practices creates different ranking results than through using a TMSE or Euclidian distance method that does not take covariance, or how random variables change together, into account. Additionally, it is interesting that a random walk forecast; a forecast that uses values for 2012 as the forecast for 2013, performed better than 11 of the 25 institutions at predicting GDP. This means that nearly half of all observed institutions, and by association the economists working for them, are forecasting the path of the German economy at a level below that of a simple statistical method. If you consider the alternative though, half of the institutions are performing better than this naïve approach which highlights the benefit of well-trained economic forecasters.

The real importance of this research is that it takes into account the variability between forecasts in specific sectors of the economy for all organizations. This allows for the premise that firms may perform well in specific areas like forecasting the level of imports, but perform poorly when taking into account other variables. For example, ING was the highest performing firm in 2013 when using the TMSE, however it performs worse than random walk methods and is ranked 12^th when using the 10 year robust weighting matrix for Mahalanobis distancing. This is due to cross-variable dependence. As a note, both TMSE and Euclidian distance assume there is no correlation between the variables in the vector. Thus TMSE and Euclidian differencing are only suitable for ranking forecasters on a variable-by-variable basis, while Mahalanobis distance is suitable for multivariate views of multiple variables as it takes into account covariance between the variables.

In the end the authors commented that there are many weighting schemes that are possible and since the relationship between the variables is unknown, there can be no unique winner. They do reiterate that the results of these institutions in forecasting the German economy are sensitive to covariance which is unique to Mahalanobis distance. Ultimately, the innovation of considering the covariance between a vector of variables has led to some interesting findings in regards to how we can rank forecasters based on overall performance instead of by using independent variables like previous measures that the literature uses.

In the future if combined with the appropriate weighting scheme, this research could be used to determine which institutions have the best overall forecasts for the German or any economy. The practical application of this is policy analysts and decision makers can rank forecasting institutions based on total accuracy in order to determine which reports are comprehensively the closest to reality. They can then exhibit more confidence when choosing which information to base decisions on, though they should still note variable-by-variable rankings if they are making decisions based on a particular metric. Additionally, these rankings may incentivize institutions that are underperforming to hire better forecasters in an attempt to bring their results more in line with reality, and increase the quality of forecasts so that they are more accurate than statistical tests that do not require any judgement.