20 Best Financial Datasets for Machine Learning

12 minutes read

Why Financial Data Powers ML

Most datasets are static snapshots. Financial data? It's alive.

Markets move. Policies shift. Consumers panic. And buried in all that noise are patterns — some obvious, most not. That’s why financial datasets are a goldmine for machine learning: they’re complex, time-based, and high-stakes. Perfect training ground for systems that learn to detect nuance, predict risk, and spot opportunity before it hits the headlines.

Need to forecast quarterly earnings? Estimate inflation trends? Build credit scoring systems? You need financial data — structured, consistent, and dense with signal. 

But here’s the kicker: not all finance data is built for ML. 

How to Choose the Right Dataset 

Before we dive into the list, let’s talk about filters. Because grabbing random CSVs from the internet isn’t going to cut it, especially when millions are on the line.

Here’s what to actually look for when picking financial data for your model: 

FeatureWhy It Matters
Time ResolutionHourly, daily, quarterly? Match the granularity to your use case. 
CompletenessMissing rows or backfilled gaps will wreck forecasting accuracy.
Domain FitDon’t train a credit model on stock data. Seriously.
Noise LevelFinancial data is messy. But too messy = garbage in, garbage out. 
Label AvailabilitySupervised learning needs ground truth — like buy/sell signals or outcomes.
LicensingMany financial datasets are behind paywalls or non-commercial licenses.
Update FrequencyFor real-time use cases, stale data is worse than none.

You’re not just feeding your model numbers — you’re feeding it context, structure, and assumptions. Choose wisely.

Top 20 Financial Datasets for ML

Grouped by type, cut down to the essentials, and optimized for real-world ML use. 

Market & Stock Data

Need to build price predictors, trading signals, or LSTM models that don’t hallucinate? These datasets cover equities, funds, and historical prices — with enough structure to actually train something usable. 

1. Yahoo Finance – S&P 500 Prices

Yahoo Finance Dataset

Access: Free (commercial use permitted via wrappers like yfinance)
The classic go-to. Daily OHLCV (open, high, low, close, volume) data for thousands of tickers — including the full S&P 500. It’s clean, updated, and widely supported in tools like yfinance for Python.
Just don’t expect ground-truth labels like “buy” or “sell” — this one’s raw prices only. 

Dataset Spotlight (click to expand)
dataset_name: Yahoo Finance – S&P 500
type: Market Data
access: Free (commercial use permitted via yfinance)
format: CSV via yfinance, JSON via API wrappers
ideal_for: LSTM models, trend detection, basic backtesting
notes: No labeled targets (raw prices only)
  

2. Alpha Vantage API

Alpha Vantage API

Access: Free tier (commercial use allowed, rate-limited) + Paid plans
A developer-friendly API for financial time series. Pull daily or intraday prices, forex, crypto, and even fundamental metrics with a free key.
Ideal if you need to automate data ingestion or work across multiple asset classes.

3. Quandl – Core US Financials

Quandl – Core US Financials Dataset

Access: Freemium (check terms for commercial use of premium content)
Quandl (now part of Nasdaq) offers curated financial datasets including equities, ETFs, and options. Many premium sources — but there’s still a lot for free under the “WIKI” and “FRED” collections.
Perfect for prototyping trading models or economic forecasting.

4. Google Finance via yfinance

Google Finance via yfinance

Access: Free (commercial use permitted)
Not technically a dataset — but if you’re prototyping in Python, yfinance is the easiest way to get real market data fast. Supports tickers, dividends, splits Smooth integration with pandas
But note: it’s not 100% official or guaranteed to stay stable.

5. Global Financial Data (GFD)

Global Financial Data (GFD)

Access: Paid / Academic license required
If you’re into historical backtesting or long-horizon forecasting, GFD is the goldmine. It contains stock prices dating back to the 1800s (!) and even includes discontinued tickers.
Not cheap. But for quant research? It’s unmatched.

Macroeconomic & Banking Data

Stock prices show what traders think. Macroeconomics shows what’s really happening.
If your model needs to understand recessions, inflation swings, or why one country collapses while another thrives — this is where to look.

6. FRED – Federal Reserve Economic Data

Federal Reserve Economic Data

Access: Free (commercial & academic use permitted)
This is the backbone of every serious macro model. Over 800,000 time series — from interest rates to unemployment spikes to business cycles — updated by the U.S. Fed itself.
The API is clean, the coverage is vast, and the metadata? Rock solid. If you’re forecasting anything related to the US economy, this is non-negotiable. 

Dataset Spotlight (click to expand)
dataset_name: FRED
type: Macroeconomic Indicators
access: Free (commercial and academic use permitted)
format: CSV, JSON via API
ideal_for: Inflation modeling, unemployment prediction, macro signals
notes: Official U.S. government data, clean and reliable
  

7. World Bank Open Data

World Bank Open Data

Access: Free (CC BY 4.0; commercial use permitted with attribution)
Global in scope, surprisingly accessible. The World Bank’s dataset spans GDP, education, trade, climate exposure, and more — all sortable by country, year, or topic.
Perfect for modeling development trajectories or building cross-country comparisons. Just don’t expect minute-level precision — this one’s built for the big picture.

8. IMF International Financial Statistics

IMF International Financial Statistics

Access: Free with registration (CC BY; commercial use permitted)
Want to see how national debts evolve, currencies crash, or inflation explodes? The IMF has you covered. Their dataset is dense, country-specific, and includes rare indicators like reserve positions and fiscal balance sheets.
Not all countries update equally often — but when they do, the detail’s worth it. 

9. OECD Public Finance & Tax

OECD Public Finance & Tax

Access: Free (commercial use permitted under open license)
Budgets, tax rates, social spending — this is the toolbox for anyone modeling policy impact or building sovereign risk models. It’s especially strong in EU countries and includes time series stretching back decades.
Don’t sleep on this one if your model touches the public sector.

10. EU Open Data Portal

EU Open Data Portal dataset

Access: Free (commercial use allowed via EU OGD License)
This is Europe’s official open data firehose. It covers structural funds, economic indicators, regional imbalances — you name it.
The data’s clean, machine-readable, and often fills in gaps you won’t find in World Bank or IMF sources. Great if your model needs subnational nuance.

Crypto & Blockchain Data

Traditional finance moves by quarters. Crypto moves by memes. If your model needs to capture volatility, sentiment, or on-chain behavior, these datasets will teach it how to ride the chaos — not drown in it. 

11. CoinMarketCap API

CoinMarketCap API dataset

Access: Free tier (commercial use allowed) + Paid plans
This is the closest thing crypto has to a Bloomberg Terminal. It offers real-time prices, market caps, circulating supply, and historical snapshots.
If your model needs up-to-date metrics or coverage across thousands of coins, start here. The free tier is generous enough for most projects. 

Dataset Spotlight (click to expand)
dataset_name: CoinMarketCap API
type: Crypto Market Metrics
access: Free tier (commercial use allowed) + Paid plans
format: JSON API
ideal_for: Volatility tracking, market cap analysis, DeFi metrics
notes: Generous free tier with global token coverage
  

12. Cryptocurrency Historical Prices

Cryptocurrency Historical Prices dataset

Access: Free (check dataset-specific terms)
A starter pack for crypto forecasting. Bitcoin, Ethereum, and others — complete with daily OHLCV and trading volume.
It’s great for training basic LSTM models or comparing tokens over time. But be warned: this is exchange-level data, not blockchain-level detail. 

13. CryptoCompare API

CryptoCompare API

Access: Free + Premium (commercial use may require paid plan)
Looking for normalized data across multiple exchanges? CryptoCompare does the heavy lifting — aggregating, cleaning, and formatting price data for spot and derivative markets.
It’s especially useful when training models that need consistent structure across assets or time zones.

14. Glassnode (on-chain analytics)

Glassnode dataset

Access: Free dashboards; Paid API license for commercial use
This one’s for when you want to go deeper — into wallets, addresses, transaction velocity, and network health. Great for behavioral modeling, anomaly detection, or building smart alerts that trigger when whales move. Just note: the real insights come with a price tag.

15. Ethereum Etherscan Dataset

Ethereum Etherscan Dataset

Access: Free (public domain usage allowed)
Raw blockchain data — gas prices, contract interactions, token transfers — all parsed and downloadable. Ideal for training models that analyze transaction networks, wallet clusters, or DeFi protocols. It’s not clean out of the box, but the detail is unparalleled.

Alt-Finance & Research-Grade Datasets

This is where things get niche, complex, and incredibly valuable. These datasets go beyond prices — capturing text, sentiment, ESG, recommendations, and even reasoning chains. Perfect for building multi-input models or testing LLMs in financial settings. 

16. FNSPID (News + Stocks Multimodal)

FNSPID (News + Stocks Multimodal)

Access: Free for research (CC BY‑NC; academic use only)
29 million stock price records + 15 million news headlines = one powerful training set. Ideal for models that combine numerical and textual inputs — like transformers that predict price movements based on headlines or event-driven anomalies. 

Dataset Spotlight (click to expand)
dataset_name: FNSPID
type: Multimodal Financial Dataset
access: Free for research (CC BY-NC; academic use only)
format: Tabular + Text (CSV + JSON)
ideal_for: Headline-to-price modeling, transformer fine-tuning, LLMs
notes: Time-aligned text and price data; academic license only
  

17. FinBen Benchmark Suite

FinBen Benchmark Suite dataset

Access: Free (academic use; commercial license TBD)
This one’s a research gem. A standardized benchmark of 36 datasets spanning tasks like information extraction, question answering, sentiment analysis, and risk modeling.
It’s made for training and evaluating financial NLP systems — and it’s structured enough to plug directly into transformer pipelines. 

18. Google Trends – Financial Topics

Google Trends – Financial Topics

Access: Free (commercial use allowed)
How often people search “market crash” isn’t just trivia — it’s signal. Google Trends tracks interest over time, giving your model a window into investor psychology.
Use it as a sentiment proxy, an external feature, or part of a multimodal stack. 

19. FinMultiTime

FinMultiTime dataset

Access: Free for research (check license in arXiv repository)
This is next-level multimodal. It includes news articles, stock tick data, candlestick charts, and tabular company data — all synchronized across time. Perfect for training foundation models or building “reasoning” agents that simulate decision-making under uncertainty. 

20. EUROFIDAI – ESG & Event Finance

EUROFIDAI – ESG & Event Finance dataset

Access: Academic access (often via subscription)
For those working on sustainable finance, corporate behavior, or event-driven trading, EUROFIDAI offers high-frequency European data on firm actions, ESG disclosures, and more.
It’s clean, structured, and packed with real-world financial signals. Especially strong for event detection tasks. 

Final Takeaways

Financial data isn’t just numbers — it’s behavior, risk, emotion, and value in motion. The right dataset doesn’t just improve your model’s accuracy — it shapes what the model sees as reality.

Whether you're building a price predictor, a credit scoring system, or a market-aware LLM, the real edge comes from curated, relevant, high-signal data. Use this list as a launchpad — but always stress-test your sources.

Because in finance, assumptions get expensive fast.

📄 Dataset Cheat Sheet (Structured Recap)

Click to view all 20 datasets with access types

- dataset_name: Yahoo Finance – S&P 500
  type: Market Data
  access: Free (commercial use permitted)
  format: CSV via yfinance
  ideal_for: Price modeling, trend detection, LSTM training

- dataset_name: Alpha Vantage API
  type: Financial Time Series
  access: Free tier (commercial use) + Paid
  format: JSON API
  ideal_for: Automated ingestion, intraday or multi-asset modeling

- dataset_name: Quandl – Core US Financials
  type: Equities, ETFs, Options
  access: Freemium (check terms for commercial use)
  format: CSV / API
  ideal_for: Fundamental analysis, quick prototyping

- dataset_name: Google Finance via yfinance
  type: Stock Data Proxy
  access: Free (commercial use allowed)
  format: Python wrapper
  ideal_for: Quick tests, academic use, exploratory modeling

- dataset_name: Global Financial Data (GFD)
  type: Historical Markets
  access: Paid / Academic license
  format: CSV / Excel
  ideal_for: Long-range backtesting, deep historical trends

- dataset_name: FRED
  type: Macroeconomic Indicators
  access: Free (commercial use permitted)
  format: CSV, JSON API
  ideal_for: Forecasting inflation, employment, macro trends

- dataset_name: World Bank Open Data
  type: Global Socio-Economic
  access: Free (CC BY commercial use permitted)
  format: CSV
  ideal_for: Development modeling, country-level comparison

- dataset_name: IMF IFS
  type: International Finance
  access: Free (register; CC BY, commercial use permitted)
  format: CSV
  ideal_for: Currency, reserves, public debt, crisis signals

- dataset_name: OECD Public Finance
  type: Tax & Fiscal Data
  access: Free (commercial use permitted)
  format: XLS/CSV
  ideal_for: Sovereign risk, policy impact, EU-specific models

- dataset_name: EU Open Data Portal
  type: Regional Economics
  access: Free (comm. allowed under EU OGD License)
  format: CSV / RDF
  ideal_for: Subnational modeling, funding analytics

- dataset_name: Kaggle Crypto Prices
  type: Historical Crypto OHLCV
  access: Free (check individual dataset terms)
  format: CSV
  ideal_for: Crypto forecasting, token comparison, basic LSTM

- dataset_name: CoinMarketCap API
  type: Crypto Market Metrics
  access: Free tier (commercial use) + Paid
  format: JSON API
  ideal_for: Real-time dashboards, market cap analysis

- dataset_name: CryptoCompare API
  type: Multi-Exchange Crypto
  access: Free + Premium (check terms for commercial use in free tier)
  format: JSON API
  ideal_for: Normalized pricing, volatility modeling

- dataset_name: Glassnode
  type: On-Chain Analytics
  access: Free dashboards + Paid API (commercial use requires license)
  format: Charts + JSON API
  ideal_for: Behavioral signals, whale tracking, alerts

- dataset_name: Ethereum Etherscan Dataset
  type: Blockchain Transactions
  access: Free (public domain usage)
  format: CSV/JSON (manual export)
  ideal_for: Smart contract modeling, wallet clustering

- dataset_name: Google Trends – Finance
  type: Search Interest Time Series
  access: Free (commercial use allowed)
  format: CSV
  ideal_for: Sentiment proxy, exogenous features, signal fusion

- dataset_name: FinBen Benchmark Suite
  type: Financial NLP
  access: Free (academic use; commercial terms TBD)
  format: JSON/TSV
  ideal_for: Text classification, QA, sentiment, risk modeling

- dataset_name: FNSPID (News + Stocks)
  type: Multimodal Financial Dataset
  access: Free (CC BY-NC academic use only)
  format: CSV + Text
  ideal_for: Headline-driven prediction, transformers, LLM training

- dataset_name: FinMultiTime
  type: Multimodal Financial Dataset
  access: Free (research; check arXiv for license)
  format: Text + Images + Tabular
  ideal_for: Foundation model pretraining, multimodal LLM

- dataset_name: EUROFIDAI ESG & Events
  type: ESG & Corporate Events
  access: Academic (paid subscription)
  format: CSV
  ideal_for: Event-driven models, ESG factor investing
  

Frequently Asked Questions (FAQ)

What are the best free financial datasets for machine learning?
Some of the best free options include Yahoo Finance for market prices, FRED for macroeconomic indicators, World Bank Open Data for global metrics, and Kaggle’s cryptocurrency archives. These datasets are clean, well-documented, and suitable for a wide range of ML tasks — from forecasting to risk modeling.
Which dataset is best for training a stock price prediction model?
If you’re building models like LSTMs or Prophet forecasters, Yahoo Finance and Alpha Vantage are excellent starting points. They provide intraday and historical OHLCV data. For long-term trends or academic research, Global Financial Data (GFD) is unmatched — with stock data dating back to the 1800s.
What financial data is needed for credit scoring?
Credit scoring typically requires labeled data like loan performance, income levels, credit utilization, and repayment history. While proprietary datasets are hard to access, macroeconomic indicators from FRED or IMF can serve as useful proxy features when building risk-aware scoring models.
Are there open financial datasets that support multimodal models?
Yes — datasets like FinBen, FNSPID, and FinMultiTime are built for exactly that. They combine text (e.g. news headlines), numerical data (e.g. prices, fundamentals), and sometimes even visuals (e.g. candlestick charts). Ideal for training transformers, decision-making agents, or LLMs in financial contexts.

Insights into the Digital World

20 Best Financial Datasets for Machine Learning

Why Financial Data Powers ML Most datasets are static snapshots. Financial data? It’s alive. Markets move. Policies shift. Consumers panic. […]

AI for Image Recognition: How Machines Learned to See—and Why It Matters 

Your phone sorts photos by face. Your car knows when you’re not paying attention. And warehouses spot defects in milliseconds. […]

Automatic Speech Recognition (ASR): How Machines Learn to Listen

1. What Is Automatic Speech Recognition? Talk to your phone. Rant to your car. Whisper to your smart speaker. And […]

Lidar vs Radar: Complete Guide 2025

They both “see” the world — but in totally different ways. Lidar sketches every curve and corner in laser-sharp detail. […]

Facial Recognition – What is It and How It Works

Facial recognition has quietly slipped into our everyday lives. It helps you unlock your phone, breeze through airport security, or […]

Research on the Most Stressful Driving Regions in the UK

Over 100,000 road accidents take place across the UK each year — a toll that includes injuries and fatalities. In […]

ML Dataset Trends Research and Statistics

Research on ML Dataset Search Trends (2019–2024)

In this study, we analyzed trends and statistics related to the search for machine learning (ML) datasets over the past […]

Validation Dataset in Machine Learning: What it is and Why it Matters

Let’s face it — training a machine learning model without a validation dataset is like prepping for a marathon but […]

What Is Object Detection in Computer Vision?

What Is Object Detection?  Object Detection is a computer vision task aimed at identifying and localizing individual objects within an […]

Panoptic Segmentation – Data Annotation Guide

Over the past few decades, computer vision has made remarkable progress. What once involved recognizing simple geometric shapes has evolved […]

Ready to get started?

Tell us what you need — we’ll reply within 24h with a free estimate

    What service are you looking for? *
    What service are you looking for?
    Data Labeling
    Data Collection
    Ready-made Datasets
    Human Moderation
    Medicine
    Other (please describe below)
    What's your budget range? *
    What's your budget range?
    < $1,000
    $1,000 – $5,000
    $5,000 – $10,000
    $10,000 – $50,000
    $50,000+
    Not sure yet
    Where did you hear about Unidata? *
    Where did you hear about Unidata?
    Head of Client Success
    Andrew
    Head of Client Success

    — I'll guide you through every step, from your first
    message to full project delivery

    Thank you for your
    message

    It has been successfully sent!

    We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.