Commercial

SWE-Bench Coding Tasks Dataset

SWE-Bench Coding Tasks Dataset  is an extended programming languages dataset that builds on the original SWE-Bench benchmark with broader language coverage, golden/test patches, and real-world coding tasks like bug fixing, code completion, and automated code review. It supports coding agents, language models, and developer tools with verified benchmark scores and multi-language test sets.

Get in touch Download sample
  • files
    8,712
  • programming languages
    6
Fermatix SWE-Bench dataset
  • Programming languages
  • Machine Learning
  • Automated Code Review
  • Bug Fixing

SWE-Bench Coding Tasks Dataset  is an extended programming languages dataset that builds on the original SWE-Bench benchmark with broader language coverage, golden/test patches, and real-world coding tasks like bug fixing, code completion, and automated code review. It supports coding agents, language models, and developer tools with verified benchmark scores and multi-language test sets.

Get in touch Download sample
  • Programming languages
  • Machine Learning
  • Automated Code Review
  • Bug Fixing
  • files
    8,712
  • programming languages
    6

Dataset Info

Characteristic Data
Description An extended benchmark of real-world software engineering tasks with enhanced artifacts and broader language coverage
Data types Text
Tasks Bug fixing, code completion, pull request generation, automated code review
Total number of files 8,712
Total number of people 30
Labeling Annotated with golden patches, test patches, post-patch reference states, and metadata stored in parquet files (e.g., repository name, issue/PR identifier, diffs, test results)
Programming languages C#, Go, PHP, Rust, Kotlin, Ruby
Fermatix SWE-Bench dataset
Download sample

Technical
Characteristics

Characteristic Data
Files Extensions parquet (metadata), .patch (golden/test patches), .txt/.xml (reference outputs), .yml (docker-compose), Dockerfile, Makefile, .env
Models Compatible with original Multi-SWE-Bench execution tools and models designed for code understanding and generation
File Size 8.85 GB
Source and collection methodology. Data was collected from permissively-licensed, non-utilitarian GitHub repositories to ensure diversity and reduce bias.

Dataset Use Cases

  • Software Development

    Improving Automated Code Generation

    SWE-Bench Coding Tasks Dataset is an extended programming languages dataset that builds on the original SWE-Bench benchmark with broader language coverage, golden/test patches, and real-world coding tasks like bug fixing, code completion, and automated code review. It supports coding agents, language models, and developer tools with verified benchmark scores and multi-language test sets. provides structured test sets from real GitHub issues, enabling evaluation of coding agents on advanced coding tasks. By offering a programming languages dataset with real-world repositories, it supports developers in benchmarking pass rates, analyzing coding benchmarks, and improving accuracy in code generation and bug fixing.

  • Machine Learning & AI

    Training and Evaluating Coding Agents

    This SWE-bench verified dataset helps researchers train large language models on software engineering challenges. With tasks requiring Python repositories and large codebases, the Multi-SWE-Bench framework delivers benchmark scores and evaluation results, making it a reliable resource for building better coding models and testing new benchmarks in real software environments.

  • Software Engineering Research

    Benchmarking Real-World Engineering Tasks

    SWE-Bench Dataset introduces a robust SWE benchmark for analyzing engineering tasks across large codebases. By including GitHub repository issues and patches, it creates realistic conditions for testing developer tools, assessing coding tasks, and validating language models against existing benchmarks, enhancing reliability in software engineering research and development practices.

  • Developer Tools & Testing

    Enhancing Reliability in Software Projects

    With nearly 9,000 files and curated annotations, this dataset helps improve developer tools for bug fixing and pull request generation. It supports testing coding agents in python projects, refining evaluation results, and addressing real-world coding challenges, strengthening recognition of pass rates across software development and testing pipelines.

FAQs

What makes SWE-Bench Coding Tasks Dataset different from existing benchmarks?
Unlike existing benchmarks limited to one language, this one expands to multiple languages and provides enhanced metadata. This makes it a more robust option for advanced coding evaluation, engineering tasks, and software development research.
Which programming languages are supported?
The dataset covers multiple programming languages including Python, C#, Go, PHP, Rust, Kotlin, and Ruby. This broader scope makes it suitable for multilingual coding benchmarks and large codebases.
How large is the dataset and in what format is it available?
The dataset size is 8.85 GB, with files in .parquet, .patch, .yml, .txt, .xml, along with Dockerfiles and Makefiles. This structure ensures compatibility with Multi-SWE-Bench execution tools and reproducible workflows.
What types of annotations are provided?
Annotations include golden patches, test patches, diffs, test results, and repository metadata. This ensures models are evaluated against verified coding benchmarks with clear pass/fail criteria, supporting transparent evaluation results.
Still have questions about using Unidata datasets? Read our user-guides

Similar Datasets

Why Companies Trust Unidata’s Services for ML/AI

Share your project requirements, we handle the rest. Every service is tailored, executed, and compliance-ready, so you can focus on strategy and growth, not operations.

70+ Datasets

  • Finance, IT, E-commerce, Retail, Healthcare and 14+ Industries
  • Multiple supported formats
01

Unique & Diverse Data

  • Diversity in ethnicity, age, country, gender, and more
  • Exclusively collected data, not available from open sources
02

Custom Dataset Solutions

  • No manual collection needed from your side; we handle everything
  • Up to 70% cheaper than in-house
03

100% Legal, Secure & Compliant

  • Curated and legally sourced
  • AWS ISO 27001/27701
04

Smooth Collaboration & Fast Delivery

  • 87% of datasets delivered in 3–10 days
  • Dedicated PM, Europe-timezone communication
05

Need Proof?

See the results we've delivered for leading tech companies and startups.

Explore datasets

What our clients are saying

UniData

4 3 Reviews

PA

Paul 2025-02-21

Very Positive Experience!

The team was very responsive when requesting a specific dataset, and was able to work with us on what data we specifically needed and custom pricing for our use case. Overall a great experience, and would recommend them to others!

TH

Thorsten 2025-01-09

Very good experience

We got in touch with UniData to buy several datasets from them. Communication was very cooperative, quick, and friendly. We were able to find contract conditions that suited both parties well. I also appreciate the team's dedication to understand and address the needs of the customer. And the datasets we bought from UniData matched with our expectations.

Max Crous 2024-10-08

Data purchase

Our team got in touch with UniData for purchasing video data. The team at UniData was transparent, timely, and pleasant to communicate and negotiate with. Their samples and descriptions aligned well with the data we received. We will certainly reach out to UniData again if we're in search of 3rd party video data.

Abhijeet Zilpelwar 2025-02-26

Data is well organized and easy to…

Data is well organized and easy to consume. We could download and use it for training within few hours of receiving the data links.

Trusted by the world's biggest brands

Our Clients Love Us

Enterprise Document Automation

Document AI Lead

The dataset gave us strong value for both pilot and early-stage testing. We plan to broaden coverage as deployment scales.

Identity Verification Lab

Deputy Director

The data was good. We passed PAD level 1 from iBeta.

Ready to get started?

Tell us what you need — we’ll reply within 24h with a free estimate

    What service are you looking for? *
    What service are you looking for?
    Data Labeling
    Data Collection
    Ready-made Datasets
    Human Moderation
    Medicine
    Other (please describe below)
    What's your budget range? *
    What's your budget range?
    < $1,000
    $1,000 – $5,000
    $5,000 – $10,000
    $10,000 – $50,000
    $50,000+
    Not sure yet
    Where did you hear about Unidata? *
    Where did you hear about Unidata?
    Head of Client Success
    Andrew
    Head of Client Success

    — I'll guide you through every step, from your first
    message to full project delivery

    Thank you for your
    message

    It has been successfully sent!

    We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.