White Paper Series

Permanent URI for this collection


Recent Submissions

Now showing 1 - 5 of 5
  • Item
    Economic Forecasting with News Headlines and Natural Language Processing
    (Rice University, 2023) Fuad, Gazi; Kowal, Daniel
    Consumer sentiment, which measures how confident individuals feel in the strength of the economy, is a crucial indicator of the overall health of the economy. However, due to the time and costs associated with collecting the survey responses associated with the Index of Consumer Sentiment (ICS), along with the delayed nature of releasing this information, there is motivation to find alternative data sources to the ICS. In this project, we investigated utilizing news headlines as an alternative signal to gauge consumer sentiment in the United States. More specifically, we utilized natural language processing techniques such as latent Dirichlet allocation (LDA) and sentiment analysis to extract quantifiable topics and sentiments from news headlines on the front page of top publications' websites. We subsequently used that information as predictors for the monthly personal saving and labor force participation rates. The topics and sentiments served as exogenous inputs in a Seasonal Autoregressive Integrated Moving Average with eXogenous regressors (SARIMAX) model to predict the actual rates, and as covariates in classification models to predict the direction of rate movement. Our findings showed that topic-sentiment combinations from news headlines have considerable predictive power in modeling future economic conditions even when comparing to the predictive power of the ICS.
  • Item
    Fake News Detection with Headlines
    (Rice University, 2023-12-12) Ramirez, Gared; Li, Meng
    Fake news has become an increasing problem due to the rising use of the Internet and social media. It is important to be able to distinguish sources of fake and misleading news articles to ensure that misinformation does not sow discord, erode trust in credible sources, and negatively impact our personal and societal well-being. Moreover, in an age where many people only skim headlines without delving into the full articles, the ability to discern fake news from headlines alone becomes even more crucial. To detect and classify fake news, we implement and compare five machine learning models–naive Bayes, logistic regression, decision tree, random forest, and support vector machine–on two different datasets: a benchmark dataset and a dataset with full articles and headlines. We utilize measures such as term frequency-inverse document frequency and sentiment scores, as predictors in our models. We find that naive Bayes consistently performs best on both datasets with accuracies of 64.40% and 92.56%, respectively.
  • Item
    Prediction of WilderHill Clean Energy Index Directional Movement
    (Rice University, 2023-05-08) Du, Yolanda; Lu, Lu; Ding, Hongkai; McGuffey, Elizabeth; Li, Meng
    The popularity of clean energy has risen recently due to concerns about climate change and the exhaustion of traditional energy sources. The stock price of clean energy companies reflects the public’s attention to the industry’s growth potential, and clean energy stocks are among the riskiest stocks to invest in. Thus, it is important to apply quantitative methods to analyze the financial risks and returns of renewable energy stocks. Prior works on the topic are mainly focused on inference rather than predictions of renewable energy stock prices. In this investigation, the directional movement of the WilderHill Clean Energy Index is predicted using machine learning methods including logistic regression, random forest, and neural networks. Using data including technical indicators and macroeconomic variables, the aim is to predict the movement of the WilderHill Clean Energy Index with high accuracy. The results suggest that for the classification models with two directions, random forest and neural networks outperform full logistic regression and stepwise logistic regression. For the classification models with a three-category target variable, random forest and neural networks models outperform full logistic regression and stepwise logistic regression in overall accuracy; however, the methods give varying results for different outcome classes, in regards to sensitivity and specificity. In addition, the relationship between renewable energy stock directional movement and independent variables is investigated. The results suggest that two important macroeconomic variables are West Texas Intermediate crude oil prices and NYSE Arca Tech 100 Index.
  • Item
    Modeling SPX Volatility to Improve Options Pricing
    (Rice University, 2021) Aiman, Jared; Iglesias, Vicente; Sarkar, Sumit; Ensor, Katherine; Dobelman, John A.
    In this project, we develop a model to predict future stock market volatility and facilitate more accurate options pricing. The Black Scholes model gives an expected premium for an options contract; however, it uses an unknown fixed parameter referred to as volatility. We advance this by using a modified Glosten-Jagannathan-Runkle Generalized Autoregressive Conditional Heteroskedasticity (GJR-GARCH) model that uses previous returns, as well as the market’s expectation of future volatility, to better predict future volatility. Additionally, we apply an Autoregressive Moving Average (ARMA) model to predict the value of future stock prices. We find that our model is able to model volatility better than using either the market volatility or a traditional GJR-GARCH model alone. This is particularly true due to our model’s ability to capture the dependence between the S&P 500 returns and the changes in the market’s expectation of volatility.
  • Item
    Predicting Student Loan Debt: A Hierarchical Time Series Analysis
    (Rice University, 2021) Elsesser, George; Ensor, Katherine
    In recent years, and especially in response to the Covid-19 pandemic, much attention has been brought to the issue of rapidly increasing student debt. Yet in the field of time series analysis, there is a dearth of studies examining trends in student loan debt. This is likely due to the impression of simple, yet steep, linear increase in student loan debt over the last decade. However, trends in this type of debt are much more complicated when a complete picture of the hierarchical nature of this data is considered. One objective of this project is to generate accurate forecasts with the impact of Covid-19 in mind not only for outstanding student loan debt, but also for sub-categories of this value based on loan status: such as loans in default or loans in repayment. To facilitate this, traditional hierarchical forecasting methods were compared to newer methods, namely MinT and its recent adaptations. Our findings indicate that MinT forecast reconciliation with the use of structural scaling results in the mostaccurate forecasts across the aggregation structure. Although not the main focus of this study, the second-level forecasts indicate a forecasted 3 1% increase in default rates between the first quarter of 2020 the second quarter of 2022 and a 12% decrease in dollars outstanding for enrolled students during the same time period.