Sentiment analysis is vital for understanding market dynamics and formulating informed investing strategies, especially in volatile financial conditions. This study advances target-based financial sentiment analysis (TBFSA) by rigorously evaluating the efficacy of Large Language Models (LLMs) in zero-shot and few-shot learning contexts. We compare cutting-edge generative LLMs, such as ChatGPT-4o, ChatGPT-4, ChatGPT-o1, DeepSeek-R1, Llama-3-8B, Gemma-2-9B, and Gemma2-27B, with conventional lexicon-based tools (VADER, TextBlob) and discriminative transformer-based models (FinBERT, FinBERT-Tone, DistilFinRoBERTa, Deberta-v3-base-absa-v1.1). Our analysis utilizes a newly curated dataset of 1,162 manually annotated Bloomberg news articles, designed explicitly for TBFSA (due to copyright constraints, only URLs are publicly released, with full news content accessible through a Bloomberg Terminal). The findings indicate that LLMs, particularly DeepSeek-R1 and ChatGPT variants (especially ChatGPT-o1), outperform lexicon-based approaches and discriminative transformer-based models across all evaluation metrics, without requiring additional training or task-specific fine-tuning. The study establishes generative LLMs as a scalable and cost-effective method for target-level sentiment analysis, relieving the need for expensive, rigorous fine-tuning. The research provides valuable insights, enabling institutions to use unstructured textual data effectively for improved real-time risk assessment, portfolio management, and algorithmic trading.

Benchmarking Large Language Models for Target-Based Financial Sentiment Analysis

Iftikhar Muhammad
;
Marco Rospocher;
2025-01-01

Abstract

Sentiment analysis is vital for understanding market dynamics and formulating informed investing strategies, especially in volatile financial conditions. This study advances target-based financial sentiment analysis (TBFSA) by rigorously evaluating the efficacy of Large Language Models (LLMs) in zero-shot and few-shot learning contexts. We compare cutting-edge generative LLMs, such as ChatGPT-4o, ChatGPT-4, ChatGPT-o1, DeepSeek-R1, Llama-3-8B, Gemma-2-9B, and Gemma2-27B, with conventional lexicon-based tools (VADER, TextBlob) and discriminative transformer-based models (FinBERT, FinBERT-Tone, DistilFinRoBERTa, Deberta-v3-base-absa-v1.1). Our analysis utilizes a newly curated dataset of 1,162 manually annotated Bloomberg news articles, designed explicitly for TBFSA (due to copyright constraints, only URLs are publicly released, with full news content accessible through a Bloomberg Terminal). The findings indicate that LLMs, particularly DeepSeek-R1 and ChatGPT variants (especially ChatGPT-o1), outperform lexicon-based approaches and discriminative transformer-based models across all evaluation metrics, without requiring additional training or task-specific fine-tuning. The study establishes generative LLMs as a scalable and cost-effective method for target-level sentiment analysis, relieving the need for expensive, rigorous fine-tuning. The research provides valuable insights, enabling institutions to use unstructured textual data effectively for improved real-time risk assessment, portfolio management, and algorithmic trading.
2025
Large Language Models, Target-Based Sentiment Analysis, Financial Sector
File in questo prodotto:
File Dimensione Formato  
73_main_long.pdf

accesso aperto

Licenza: Creative commons
Dimensione 1.45 MB
Formato Adobe PDF
1.45 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11562/1176410
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact