Home Unidata Blog Datasets Dataset Collections Best Retail Datasets for Machine Learning 2026

Dataset Collections

Best Retail Datasets for Machine Learning 2026

9 June, 2026

16 minutes read

Best Retail Datasets for Machine Learning 2026

Retail data is a security camera for your business logic. It quietly records what customers touched, ignored, compared, returned, and bought — and it does not care what your dashboard hoped would happen.

This free-first shortlist covers 20 retail datasets you can use to train models for demand forecasting, recommendation systems, basket analysis, customer segmentation, product recognition, and receipt understanding. Each card calls out the parts that usually break projects: leakage, missing promo context, messy identifiers, and time splits that look clean but lie.

What Counts as a Retail Dataset

A "retail dataset" is rarely one table. It is usually a stack of views on the same reality:

Transactions and baskets: orders, line items, quantities, prices, discounts, returns.
Catalog and taxonomy: SKU metadata, category trees, brand, attributes, images.
Behavior data: page views, sessions, add-to-cart events, searches, clicks.
Operations: inventory, replenishment, store calendars, promotions, stockouts.
Retail documents: receipts, invoices, product labels, shelf tags, planograms.
Context: holidays, weather, macro indicators, store geography, foot traffic proxies.

If you have only one of these layers, you can still build useful models — just be honest about what the data cannot explain.

What You Can Build with Retail Data

ML task	What you need from the data	Typical model type
Demand / inventory forecasting	Time series with promo + calendar context	LightGBM, NHITS, Temporal Fusion Transformer
Recommendations	User–item interaction histories (clicks, purchases)	Matrix factorization, GRU4Rec, SASRec
Basket analysis	Transaction logs with line items	Apriori, co-occurrence embeddings
Customer segmentation	RFM signals, behavioral history	K-means, hierarchical clustering
Retail computer vision	Labeled shelf or checkout images	YOLO, Faster R-CNN, ViT
Receipt / invoice understanding	Annotated scan images with field labels	LayoutLM, Donut, PaddleOCR

How to Choose a Retail Dataset without Wasting a Week

1) What is the prediction target, exactly

"Sales forecasting" can mean daily unit sales per store, next-week revenue per category, or next-basket items per user. Pick a dataset where the target is natural to the data structure, not forced. If you need to engineer the target variable from scratch, you lose a week before a single model runs.

2) Is there a clean time axis

Retail models fail when future and past are mixed. Prefer datasets with timestamps that allow strict temporal splits — train on earlier periods, test on later ones. A random 80/20 split will always inflate metrics because it leaks seasonality and promotion patterns into training.

3) Do you get the context that drives behavior

Promotions, price changes, stockouts, and holidays often explain more variance than your model architecture. If those fields are missing, your model learns a world that does not exist and generalizes to nowhere. Check for supplemental context files before committing to a dataset.

4) Are IDs stable and joinable

Look for stable keys: user_id, session_id, item_id, store_id. If the dataset hashes identifiers without a join table, confirm you can still link events to products. Some datasets look relational until you try to actually join them.

5) What is the license and access friction

Some datasets are free but require accounts. Some are non-commercial only. Some are third-party mirrors of datasets that have since changed access terms. Verify the original source license before building a pipeline — "research-only" may not work if you plan to ship.

Demand Forecasting and Sales Prediction

M5 Forecasting — Accuracy (Walmart)

Volume: 30,490 time series × 1,941 days; item-level daily unit sales across 10 Walmart stores in 3 U.S. states

Access: Free (Kaggle account required)

Format: CSV

Task Fit: Hierarchical demand forecasting, feature engineering for retail, backtesting

M5 is the benchmark for retail demand forecasting. The three-level hierarchy — item, department, store, state — forces you to think about how forecasts should aggregate and reconcile, and there is a large library of published baselines to compare against. Use it to build and validate a forecasting pipeline before moving to proprietary store data. The main gotcha: promotional and calendar effects are in supplemental files that most teams process too late. Build your feature pipeline around those files first, then train the model.

Corporación Favorita Grocery Sales Forecasting

Volume: ~125M rows; 4,100 SKUs across 54 stores; 5 years of daily sales (2013–2017)

Access: Free (Kaggle account required)

Format: CSV

Task Fit: Promo-aware demand forecasting, multi-store modeling, seasonality

Favorita is a messier, more realistic alternative to M5 when you need promo-aware forecasting at scale. The supplemental files — holiday calendars, oil prices, store metadata — reward teams who treat demand as a feature engineering problem, not a time series modeling problem. Grocery sales have heavier substitution effects than general merchandise: if you evaluate on aggregate error only, your model may look fine while failing individual SKUs. Track per-SKU error distributions.

Rossmann Store Sales

Volume: 1,017 stores; 2.5 years of daily data; ~1M rows; store metadata and promo flags included

Access: Free (Kaggle account required)

Format: CSV

Task Fit: Store-level forecasting, calendar effect modeling, feature engineering practice

Rossmann is the entry point for retail forecasting — small enough to train quickly, documented enough to find working baselines, real enough to surface pipeline mistakes. It is commonly used to teach temporal validation because the dataset is approachable and the community has been active for over a decade. The key trap is missing days: store closures appear as gaps, not zeros. Treat missing entries as an explicit feature class, not an imputation problem.

UCI Hierarchical Sales Data

Volume: Compact benchmark; designed for reconciliation testing rather than scale

Access: Free

Format: CSV

Task Fit: Hierarchical forecasting, reconciliation strategies (bottom-up, top-down, MinT), method comparison

This dataset exists specifically for testing reconciliation approaches — whether bottom-up, top-down, or optimal methods like MinT. Its small size is intentional: fast iteration on method comparison, not production-scale training. Use it to confirm your reconciliation strategy does not degrade aggregate accuracy before running M5. The smaller the dataset, the easier it is to overfit with feature-heavy models — keep baselines honest.

U.S. Census Monthly Retail Trade

Volume: Monthly from 1992–present; sector-level aggregate sales and inventory estimates by retail category

Access: Free

Format: CSV via API

Task Fit: Exogenous regressor for demand models, macro context features, regime shift detection

Census Monthly Retail Trade is not a SKU-level dataset — it is aggregate sales data by retail sector. Its value is as an exogenous context signal: macro cycles, regime shifts, and category-level momentum that store-level data does not capture on its own. Use it as a regressor alongside your primary sales dataset. Do not use it as a standalone prediction target; the aggregation hides exactly the store-level and SKU-level variance your model needs to learn.

Transactions, Baskets, and Customer Behavior

Instacart Online Grocery Shopping

Volume: 3.4M orders; 206K users; 50K products; prior order histories included alongside next-basket targets

Access: Free (Kaggle account required)

Format: CSV (relational — orders, products, departments, aisles join tables)

Task Fit: Next-basket prediction, reorder modeling, basket analysis, product affinity

Instacart is the strongest public dataset for next-basket prediction. The relational structure — orders, product metadata, department and aisle hierarchies, reorder flags — lets you build a clean multi-table schema before touching any model code. Reorder flags are genuine signals, not derived. The main constraint is retailer scope: this is one online grocery fulfillment model, so substitution logic, impulse buys, and in-store browse behavior are absent by design. Validate any behavioral assumptions against your own retailer's context before generalizing.

Dunnhumby "The Complete Journey"

Volume: ~2,500 households; 2 years of transactions; 8 tables including campaigns, coupons, products, and household demographics

Access: Free with signup (research and non-commercial use)

Format: CSV (relational — 8 tables, requires joins)

Task Fit: Customer segmentation, CLV modeling, promotion lift, coupon response, basket analysis

Dunnhumby is the richest public retail dataset for customer analytics. Household demographics, coupon redemption records, and campaign exposure logs make promotion-response modeling possible in ways most public datasets cannot support. The dataset is genuinely multi-table: careless joins multiply rows and corrupt aggregates. Validate join cardinality before you build training features. The 2-year window is also modest for long-horizon CLV modeling — supplement with synthetic cohort extension if you need more purchase history per customer.

Online Retail II (UCI)

Volume: ~1M transactions; 5,628 customers; 2009–2011; UK-based online gift wholesaler Access: Free

Format: XLSX (CSV-convertible)

Task Fit: RFM segmentation, basket analysis, customer churn, cohort modeling

Online Retail II is a clean transaction log — line items, quantities, prices, customer IDs, and invoice timestamps across two years. It is standard for RFM analysis and market basket mining. The main limitation is retailer type: a gift wholesaler operates on bulk-order logic with a B2B-skewed customer base, which differs substantially from grocery, fashion, or general merchandise retail. Before drawing behavioral conclusions, confirm whether your domain's purchase patterns hold in this context.

Recommendations and Ranking

YOOCHOOSE (RecSys Challenge 2015)

Volume: 33M click events; 1.15M buy events; ~9M sessions; e-commerce platform (retailer undisclosed)

Access: Free

Format: CSV (two files: clicks log and buy log)

Task Fit: Session-based recommendation, click-through rate prediction, buy conversion modeling

YOOCHOOSE is the standard starting point for session-based recommendation research. The explicit split between click events and buy events lets you model not just what users browse but what they convert on — a more useful training signal for most e-commerce teams. A large library of published baselines exists from 2015 onward. The retailer is not disclosed and item metadata is not provided, which blocks content-aware approaches. Build your session-based baseline here, then validate on a dataset with item features.

RetailRocket E-Commerce Dataset

Volume: 2.8M events; 1.4M unique visitors; ~235K unique items; item category properties included for a subset

Access: Free (Kaggle account required)

Format: CSV (events log + item property tables)

Task Fit: Session-based recommendations, event-level click and purchase modeling, content-aware filtering

RetailRocket is compact enough to train quickly and has the item metadata that YOOCHOOSE lacks — product category and property values for a subset of the catalog. The combination of click, add-to-cart, and transaction events in one log makes funnel-level feature engineering straightforward. The tradeoff is scale: at ~1.4M visitors, the interaction matrix is sparse for long-tail items. Test your model's behavior on those tail items explicitly — they tend to degrade fastest in production.

H&M Personalized Fashion Recommendations

Volume: 31M transactions; 1.37M customers; 105K article IDs; full product images and text descriptions included

Access: Free (Kaggle account required)

Format: CSV + JPEG images

Task Fit: Personalized recommendations, fashion retrieval, next-purchase prediction, cold-start modeling

H&M is one of the few public recommendation datasets that includes both purchase history and full product content — images, text descriptions, and structured attribute metadata. That combination makes it useful for multimodal recommendation research that goes beyond collaborative filtering. Long purchase histories for active customers support cohort and sequence-based modeling. Cold-start is a real problem: a large share of articles have few or no transactions. Explicitly model item cold-start or you will miss a significant product class that matters in production.

Amazon Review Data 2023 (UCSD)

Volume: 571M+ reviews; 2023 release; 33 category subsets available separately; includes user history, item metadata, and review text

Access: Free

Format: JSON/JSONL per category subset

Task Fit: Recommendation research, sentiment classification, review summarization, attribute extraction, demand proxying

Amazon Review Data from UCSD is the largest publicly available review corpus with per-item metadata. The 2023 release adds richer item attributes and user-level review histories compared to the widely used 2018 version. It works well for recommendation research where review text is a content signal and for sentiment or attribute extraction tasks. Reviews are a biased sample of buyers: frequent reviewers and outlier experiences dominate. If you use it for demand proxy modeling, calibrate against actual sales signals before drawing quantitative conclusions.

DIGINETICA — CIKM Cup 2016 Track 2

Volume: ~5.3M clicks; ~1.2M queries; ~204K sessions; product categories and price data included

Access: Free

Format: CSV

Task Fit: Session-based recommendations with search context, query-item ranking, e-commerce search modeling

DIGINETICA adds search and browsing logs to the session recommendation problem — useful when you want to model "query intent + item interaction" rather than pure clickstream. The combination of search queries, click sequences, and purchase events supports hybrid session-recommendation and learning-to-rank approaches. The main risk with search log data is position bias: your model may learn the previous ranking system's behavior rather than genuine customer preference. Log the original display position of items shown and control for it if you can.

E-commerce Operations, Product Catalogs, and Web Analytics

Olist Brazilian E-Commerce Public Dataset

Volume: ~100K orders; 8 relational tables (orders, items, payments, reviews, geolocation, sellers); 2016–2018

Access: Free (Kaggle account required)

Format: CSV (relational)

Task Fit: Delivery time prediction, order outcome analysis, seller analytics, multi-table schema practice

Olist is a realistic multi-table e-commerce dataset with orders, products, payments, logistics estimates, and review scores. It is useful for teams building delivery prediction, marketplace health scoring, or churn models — tasks that require multiple entity types joined together. The Brazilian e-commerce context means logistics patterns and geography play a first-class role. Multi-table joins can silently multiply rows: run a sanity-check query on cardinality before building any training features.

Open Food Facts

Volume: 3M+ products globally; barcodes, nutrition labels, ingredients, packaging, and category metadata

Access: Free (Open Database License — ODbL)

Format: CSV / MongoDB dump / REST API

Task Fit: Product matching, attribute extraction, catalog normalization, grocery recommendation enrichment

Open Food Facts is a community-maintained open product catalog useful for enriching grocery recommendation pipelines with nutrition, category, and packaging metadata. It is particularly valuable for catalog normalization across retailers — matching products by barcode when retailer-specific SKU schemas diverge. The main limitation is uneven completeness: some product attributes are missing for large portions of the catalog. Build quality filters and fallbacks before using it in a production matching pipeline.

Google Analytics Sample Dataset (BigQuery)

Volume: ~900K sessions; Google Merchandise Store data; available as a BigQuery public table

Access: Free (Google Cloud account and BigQuery access required)

Format: BigQuery table (nested JSON schema)

Task Fit: Web analytics ML, funnel analysis, session feature engineering, conversion prediction sandbox

The Google Analytics Sample Dataset is a sandbox for web analytics ML when you cannot use production tracking data. It models e-commerce sessions with pageview sequences, traffic source, device, and transaction outcomes. The BigQuery schema is nested and requires unnesting — plan your feature engineering around that. The dataset reflects a single Google-run store, so attribution patterns and traffic mix may not generalize to other retailers. Use it to build and test session-based feature pipelines before applying them to proprietary data.

UCI Online Shoppers Purchasing Intention

Volume: 12,330 sessions; 18 features including page views, bounce rate, exit rate, product-related duration, and revenue label

Access: Free

Format: CSV

Task Fit: Conversion prediction, session exit rate modeling, feature importance for retail funnels, class imbalance practice

This is a small but well-structured dataset for binary conversion prediction — the target is whether a session resulted in a purchase (about 15.5% positive rate, reflecting realistic class imbalance). It works for testing feature pipelines and imbalance handling strategies before scaling to larger behavioral datasets. The feature set includes web-session signals that are easily available in production logs, which makes models trained here straightforward to port. Its small size means you can overfit quickly — keep baseline models alongside any complex classifier.

Retail Computer Vision: Shelves, Checkout, and Product Images

RPC: Retail Product Checkout Dataset

Volume: 83,739 single-product images; 24,000 checkout images with 200 product categories

Access: Free with signup

Format: JPEG + JSON annotations (COCO-compatible)

Task Fit: Automatic checkout, product recognition in cluttered scenes, instance detection, category-level recognition

RPC is built specifically for automatic checkout — the scenario where a camera at the register needs to recognize multiple products simultaneously, often with partial occlusion and lighting variation. Single-product reference images are clean; checkout images are cluttered. That domain gap is the whole story here: train on clean product images alone and you will still fail on real checkout scenes. Model clutter explicitly, or plan for a domain adaptation step between reference and deployment images.

SKU-110K

Volume: 11,762 images; ~1.7M bounding box annotations; real store shelf images with dense packing

Access: Free

Format: JPEG + CSV annotations

Task Fit: Dense object detection on shelves, planogram compliance monitoring, out-of-stock detection, shelf-level product analysis

SKU-110K is the reference dataset for dense object detection on retail shelves — many small instances per image, heavy occlusion, real store conditions with mixed lighting. It is the right dataset when your use case is shelf monitoring rather than isolated product recognition. Standard detection metrics like mAP can look acceptable while the model still misses rare SKUs on sparsely populated shelves. Track per-category recall, not just overall mAP, to catch this before deployment.

DeepFashion2

Volume: 491,895 images; 801,000 clothing instances; 13 clothing categories with keypoints, segmentation, and retrieval pairs

Access: Free with signup (academic use; application required)

Format: JPEG + JSON annotations

Task Fit: Fashion attribute detection, keypoint estimation, clothing retrieval, catalog enrichment, visual similarity

DeepFashion2 is a comprehensive fashion CV dataset with detection, keypoints, segmentation, and retrieval annotations in one package — useful for fashion search, catalog enrichment, and visual similarity tasks. The retrieval pairs (consumer-to-shop and shop-to-consumer) are particularly valuable for matching customer photos to catalog items. Fashion taxonomies drift: what mattered in the original collection period (2019) may not map cleanly to modern catalog attribute schemas. Validate category and attribute definitions against your actual catalog before using the labels directly.

When Public Retail Data Is Not Enough

Public datasets are the right starting point for baselines and proof-of-concept work. They rarely match the constraints of a production retail system:

You need full coverage across regions and channels — not one retailer's data.
You need consistent identifiers across stores, products, and campaigns — not anonymized keys.
You need timely updates tied to your own product catalog — not 2019 images or 2018 transactions.
You need permissioned use for production — not research-only licenses.

Teams that get past prototyping often discover a different gap: the data they need does not exist as a download. They need product images annotated to their own SKU schema, intent labels trained on their actual chat logs, or a pipeline that handles their specific receipt format. Those are custom annotation projects, not dataset downloads. Annotation at e-commerce scale is straightforward to scope: examples from this kind of work include 150,000 intent-labeled messages for a retail chatbot and 100,000 annotated chat messages for platform safety classification. [1] [2]

If your production requirements have outgrown public datasets, consider custom data collection as the next step — or start from a build vs. buy analysis to clarify what you actually need.

Conclusion

Retail datasets are most useful when they reproduce real retail constraints: time, promos, stock, and messy customer behavior. Start with one dataset that matches your main task, then test your pipeline on a second dataset with different characteristics. If your approach survives that, it is closer to being production-ready.

Frequently Asked Questions (FAQ)

What is a retail dataset in machine learning?

A retail dataset is any structured data that captures shopping behavior or retail operations. The most common forms are transaction logs (orders and line items), clickstream sessions (views, carts, and purchases), product catalogs (SKU metadata and images), and retail images (shelf photos, checkout scenes, product pictures). The right dataset depends on your task: forecasting needs clean time series with promotional context, recommendations need interaction histories, and retail CV needs labeled images tied to real store conditions.

What is the best retail dataset for demand forecasting?

M5 Forecasting is the standard starting point — it covers item, department, store, and state levels with promotional context and a large community of published baselines. Validate your pipeline on a second dataset like Rossmann or Favorita to check whether your feature assumptions generalize. Your choice should depend on whether you need SKU-level signals, store metadata, calendar effects, and promo context — not just a large row count.

What datasets are best for building a recommender system for e-commerce?

For session-based recommenders, YOOCHOOSE (RecSys 2015) and RetailRocket are common starting points. For next-basket prediction and reorder modeling, Instacart includes repeat purchase patterns and product taxonomy. H&M adds full product content — images and text — for multimodal recommendation research. Amazon Review Data from UCSD is the right choice when you need review text as a content signal alongside user histories.

How do I avoid data leakage in retail sales prediction?

Use a strict time cutoff: compute all aggregates on the training window only, and treat promotions, returns, and inventory as time-dependent facts. If a field was not known at prediction time, it cannot be a feature. In retail, leakage is often subtle: a “final price” field may already reflect future discounts, returns may be recorded after the forecast horizon, and inventory snapshots may show future stockout resolutions. Build leakage checks into your feature validation pipeline, not as an afterthought.

Are there free retail datasets for computer vision?

Yes, and access varies by task. RPC (Retail Product Checkout) covers automatic checkout with 83K single-product and 24K checkout scene images — free with signup. SKU-110K covers dense object detection on shelves — 11.7K images, 1.7M bounding boxes, free on GitHub. DeepFashion2 covers fashion detection, keypoints, and retrieval — 491K images, free with an academic application. All three target different hardware setups and label schemas, so match the dataset to the specific scenario your model will face in deployment.

Insights into the Digital World

Best Autonomous Driving Datasets 2026

Autonomous driving models do not learn “driving.” They learn patterns in pixels, point clouds, radar returns, and trajectories. The dataset […]

Read

Why One Model Is Never Enough: A Guide to Ensemble Learning Methods

Your model is only as accurate as its weakest assumption. Train a single model too closely, and it memorizes noise. […]

Read

Imitation Learning: From Basic Concepts to Advanced Implementation

When an AI system learns a task through trial and error, training can take weeks or even months before the […]

Read

Best Retail Datasets for Machine Learning 2026

Retail data is a security camera for your business logic. It quietly records what customers touched, ignored, compared, returned, and […]

Read

A Guide to Sourcing Datasets

High-quality datasets power AI and machine learning. When the data is weak, the model does not get a fair shot. […]

Read

What Is Robot Learning? A Complete Guide

At Unidata, we supply training data for robot learning systems — demonstration datasets, perception labeling, offline RL corpora. Every project […]

Read

20 Best Face Recognition Datasets for ML in 2026

Your model won’t guess a face out of thin air. It learns. From pixels, patterns — and the datasets you […]

Read

Robot Training Data: A Practical Guide to Collection, Annotation, and Pipelines

Most robotics projects don’t fail on the model. They fail on the data — wrong type, wrong distribution, annotation that […]

Read

Data Ingestion Patterns

Data ingestion is the loading dock of your data pipeline. It is how you collect raw data from many sources […]

Read

How to Build a Custom Dataset with Web Scraping

What is Web Scraping and Why Use It? Web scraping (aka data scraping or web crawling) is the automated process […]

Read

Ready to get started?

Tell us what you need — we’ll reply within 24h with a free estimate

What service are you looking for? *

What service are you looking for?

Data Labeling

AI Model Testing

Data Collection

Ready-made Datasets

Human Moderation

Medicine

Other

What's your budget range? *

What's your budget range?

< $5,000

$5,000 – $25,000

$25,000 – $50,000

$50,000 – $100,000

$100,000+

Not sure yet

Where did you hear about Unidata? *

Where did you hear about Unidata?

Google LinkedIn Kaggle / Hugging Face / Github Referral (colleague, partner, client) G2 ChatGPT / AI assistant Other

I agree to the Terms of Service and Privacy Policy. By submitting my contact information, I consent to receive emails, messages, and calls from Unidata and its affiliates.

Andrew: Head of Client Success

— I'll guide you through every step, from your first
message to full project delivery

Thank you for your
message

It has been successfully sent!

We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.

Best Retail Datasets for Machine Learning 2026

What Counts as a Retail Dataset

What You Can Build with Retail Data

How to Choose a Retail Dataset without Wasting a Week

Demand Forecasting and Sales Prediction

Transactions, Baskets, and Customer Behavior

Recommendations and Ranking

E-commerce Operations, Product Catalogs, and Web Analytics

Retail Computer Vision: Shelves, Checkout, and Product Images

When Public Retail Data Is Not Enough

Conclusion

Frequently Asked Questions (FAQ)

Insights into the Digital World

Best Autonomous Driving Datasets 2026

Why One Model Is Never Enough: A Guide to Ensemble Learning Methods

Imitation Learning: From Basic Concepts to Advanced Implementation

Best Retail Datasets for Machine Learning 2026

A Guide to Sourcing Datasets

What Is Robot Learning? A Complete Guide

20 Best Face Recognition Datasets for ML in 2026

Robot Training Data: A Practical Guide to Collection, Annotation, and Pipelines

Data Ingestion Patterns

How to Build a Custom Dataset with Web Scraping

Ready to get started?

Thank you for your message

Ready to get started?

Thank you for your
message