Firm-specific news sentiment and short-horizon risk-adjusted return predictability: evidence from large language models

Muhammad, Iftikhar

Financial markets operate in a narrative-driven information environment in which news, disclosures, and digital communication shape how investors interpret firms’ future prospects. Although asset-pricing theory emphasizes fundamentals, prices ultimately reflect heterogeneous expectations and evaluative beliefs, commonly referred to as investor sentiment. The economic relevance of sentiment depends critically on how it is defined, measured, and aligned with the asset-pricing unit of analysis. This thesis develops a firm-level framework for analysing narrative sentiment in financial markets and examines whether sentiment embedded in financial news contains economically meaningful information at short horizons. It further investigates whether advances in large language models (LLMs) enhance the extraction of firm-specific sentiment and improve its predictive and economic value. The central premise is that sentiment measurement should correspond to the economic object being evaluated. Financial news articles frequently discuss multiple firms within a single narrative and may convey heterogeneous or contrasting evaluations across entities. While document-level sentiment measures provide useful aggregate signals, they do not explicitly attribute evaluative tone to individual firms. The thesis, therefore, adopts a target-level approach in which sentiment is systematically linked to the specific firm under analysis, ensuring conceptual alignment between narrative evaluation and firm-level returns. To operationalize this framework, the thesis constructs and publicly releases two expert-annotated Bloomberg-based datasets: one comprising 1476 firm-specific headlines and another containing 1162 full-length news articles. Both datasets are manually labelled at the firm level under structured annotation guidelines and achieve high intercoder reliability. Using these datasets, the thesis benchmarks lexicon-based approaches, discriminative transformer-based models, and state-of-the-art generative LLMs, with the latter evaluated under zero-shot and few-shot prompting strategies. Lexicon-based tools exhibit limited sensitivity to contextual nuance and entity-specific attribution. Discriminative transformer models perform more strongly but remain dependent on training-domain alignment. In contrast, contemporary generative LLMs consistently achieve superior performance in target-level classification tasks without task-specific fine-tuning. The thesis then evaluates whether firm-level sentiment signals improve short-horizon return forecasting and generate economically meaningful performance. Sentiment derived from headlines and full-length articles is integrated with lagged returns and technical indicators to predict one-, two-, and three-day-ahead return directions for large U.S. technology firms. Predictive frameworks include linear benchmarks and nonlinear ensemble methods. Performance is assessed using classification metrics and economic criteria, including Sharpe ratios and risk-adjusted alpha within the Fama-French five-factor framework. Several consistent findings emerge. Linear models relying solely on past returns exhibit limited predictive power. Nonlinear ensemble methods substantially outperform linear benchmarks, indicating that sentiment-return relationships are interaction-dependent. Sentiment extracted from full-length articles provides stronger predictive gains than headline-based sentiment, highlighting the importance of narrative depth. Predictive improvements strengthen at multi-day horizons, consistent with gradual information diffusion. Most importantly, models incorporating LLM-derived article-level sentiment generate positive and statistically significant risk-adjusted alpha under out-of-sample evaluation, whereas price-only strategies do not. Taken together, the evidence demonstrates that firm-specific narrative sentiment embedded in financial news contains economically relevant short-horizon information when measured at the appropriate level of granularity and evaluated under disciplined asset-pricing benchmarks. The thesis contributes to behavioural asset pricing, financial natural language processing, and quantitative finance by establishing a scalable and economically validated framework for linking narrative information to return predictability.