关键词: Granger causality test Sharpe ratio Stock markets Stock movement Time series Topic modeling

来  源:   DOI:10.7717/peerj-cs.1156   PDF(Pubmed)

Abstract:
Open text data, such as financial news, are thought to be able to affect or to describe stock market behavior, however, there are no widely accepted algorithms for extracting the relationship between stock quotes time series and fast-growing textual representation of economic information. The field remains challenging and understudied. In particular, topic modeling as a powerful tool for interpretable dimensionality reduction has been hardly ever used for such tasks. We present a topic modeling framework for assessing the relationship between financial news stream and stock prices in order to maximize trader\'s gain. To do so, we use a dataset of economic news sections of three Russian national media sources (Kommersant, Vedomosti, and RIA Novosti) containing 197,678 economic articles. They are used to predict 39 time series of the most liquid Russian stocks collected over eight years, from 2013 to 2021. Our approach shows the ability to detect significant return-predictive signals and outperforms 26 existing models in terms of Sharpe ratio and annual return of simple long strategy. In particular, it shows a significant Granger causal relationship for more than 70% of portfolio stocks. Furthermore, the approach produces highly interpretable results, requires no domain-specific dictionaries, and, unlike most existing industrial solutions, can be calibrated for individual time series. This makes it directly usable for trading strategies and analytical tasks. Finally, since topic modeling shows its efficiency for most European languages, our approach is expected to be transferrable to European stock markets as well.
摘要:
打开文本数据,比如财经新闻,被认为能够影响或描述股票市场行为,然而,没有广泛接受的算法来提取股票报价时间序列和快速增长的经济信息文本表示之间的关系。该领域仍然具有挑战性和研究不足。特别是,主题建模作为可解释降维的强大工具,几乎从未用于此类任务。我们提出了一个主题建模框架,用于评估财经新闻流与股票价格之间的关系,以最大化交易者的收益。要做到这一点,我们使用三个俄罗斯国家媒体来源(Kommersant,Vedomosti,和RIANovosti)包含197,678篇经济文章。它们用于预测八年来收集的39种最具流动性的俄罗斯股票的时间序列,从2013年到2021年。我们的方法显示了检测显著的回报预测信号的能力,并且在夏普比率和简单多头策略的年回报方面优于26个现有模型。特别是,对于超过70%的投资组合股票,它显示出显著的格兰杰因果关系。此外,该方法产生高度可解释的结果,不需要特定于域的字典,and,与大多数现有的工业解决方案不同,可以针对单个时间序列进行校准。这使得它可以直接用于交易策略和分析任务。最后,因为主题建模显示了它对大多数欧洲语言的效率,我们的方法预计也将转移到欧洲股市。
公众号