Commercial

SWE-Bench Coding Tasks Dataset

SWE-Bench Coding Tasks Dataset is an extended programming languages dataset that builds on the original SWE-Bench benchmark with broader language coverage, golden/test patches, and real-world coding tasks like bug fixing, code completion, and automated code review. It supports coding agents, language models, and developer tools with verified benchmark scores and multi-language test sets.

Request a demo
Fermatix SWE-Bench dataset
  • files
    8,712
  • programming languages
    6
  • Programming languages
  • Machine Learning
  • Automated Code Review
  • Bug Fixing
  • files
    8,712
  • programming languages
    6

Dataset Info

Characteristic Data
Description An extended benchmark of real-world software engineering tasks with enhanced artifacts and broader language coverage
Data types Text
Tasks Bug fixing, code completion, pull request generation, automated code review
Total number of files 8,712
Total number of people 30
Labeling Annotated with golden patches, test patches, post-patch reference states, and metadata stored in parquet files (e.g., repository name, issue/PR identifier, diffs, test results)
Programming languages C#, Go, PHP, Rust, Kotlin, Ruby
Fermatix SWE-Bench dataset
Download sample

Technical
Characteristics

Characteristic Data
Files Extensions parquet (metadata), .patch (golden/test patches), .txt/.xml (reference outputs), .yml (docker-compose), Dockerfile, Makefile, .env
Models Compatible with original Multi-SWE-Bench execution tools and models designed for code understanding and generation
File Size 8.85 GB
Source and collection methodology. Data was collected from permissively-licensed, non-utilitarian GitHub repositories to ensure diversity and reduce bias.

Dataset Use Cases

  • Software Development

    Improving Automated Code Generation

    SWE-Bench Coding Tasks Dataset is an extended programming languages dataset that builds on the original SWE-Bench benchmark with broader language coverage, golden/test patches, and real-world coding tasks like bug fixing, code completion, and automated code review. It supports coding agents, language models, and developer tools with verified benchmark scores and multi-language test sets. provides structured test sets from real GitHub issues, enabling evaluation of coding agents on advanced coding tasks. By offering a programming languages dataset with real-world repositories, it supports developers in benchmarking pass rates, analyzing coding benchmarks, and improving accuracy in code generation and bug fixing.

  • Machine Learning & AI

    Training and Evaluating Coding Agents

    This SWE-bench verified dataset helps researchers train large language models on software engineering challenges. With tasks requiring Python repositories and large codebases, the Multi-SWE-Bench framework delivers benchmark scores and evaluation results, making it a reliable resource for building better coding models and testing new benchmarks in real software environments.

  • Software Engineering Research

    Benchmarking Real-World Engineering Tasks

    SWE-Bench Dataset introduces a robust SWE benchmark for analyzing engineering tasks across large codebases. By including GitHub repository issues and patches, it creates realistic conditions for testing developer tools, assessing coding tasks, and validating language models against existing benchmarks, enhancing reliability in software engineering research and development practices.

  • Developer Tools & Testing

    Enhancing Reliability in Software Projects

    With nearly 9,000 files and curated annotations, this dataset helps improve developer tools for bug fixing and pull request generation. It supports testing coding agents in python projects, refining evaluation results, and addressing real-world coding challenges, strengthening recognition of pass rates across software development and testing pipelines.

FAQs

What makes SWE-Bench Coding Tasks Dataset different from existing benchmarks?
Unlike existing benchmarks limited to one language, this one expands to multiple languages and provides enhanced metadata. This makes it a more robust option for advanced coding evaluation, engineering tasks, and software development research.
Which programming languages are supported?
The dataset covers multiple programming languages including Python, C#, Go, PHP, Rust, Kotlin, and Ruby. This broader scope makes it suitable for multilingual coding benchmarks and large codebases.
How large is the dataset and in what format is it available?
The dataset size is 8.85 GB, with files in .parquet, .patch, .yml, .txt, .xml, along with Dockerfiles and Makefiles. This structure ensures compatibility with Multi-SWE-Bench execution tools and reproducible workflows.
What types of annotations are provided?
Annotations include golden patches, test patches, diffs, test results, and repository metadata. This ensures models are evaluated against verified coding benchmarks with clear pass/fail criteria, supporting transparent evaluation results.
Still have questions about using Unidata datasets? Read our user-guides

Similar Datasets

What our clients are saying

UniData

4 3 Reviews

PA

Paul 2025-02-21

Very Positive Experience!

The team was very responsive when requesting a specific dataset, and was able to work with us on what data we specifically needed and custom pricing for our use case. Overall a great experience, and would recommend them to others!

TH

Thorsten 2025-01-09

Very good experience

We got in touch with UniData to buy several datasets from them. Communication was very cooperative, quick, and friendly. We were able to find contract conditions that suited both parties well. I also appreciate the team's dedication to understand and address the needs of the customer. And the datasets we bought from UniData matched with our expectations.

Max Crous 2024-10-08

Data purchase

Our team got in touch with UniData for purchasing video data. The team at UniData was transparent, timely, and pleasant to communicate and negotiate with. Their samples and descriptions aligned well with the data we received. We will certainly reach out to UniData again if we're in search of 3rd party video data.

Abhijeet Zilpelwar 2025-02-26

Data is well organized and easy to…

Data is well organized and easy to consume. We could download and use it for training within few hours of receiving the data links.

Why Choose Us

Unidata offers unparalleled expertise in AI data solutions, delivering superior data quality and optimized workflows

Expertise

Our team consists of industry-leading experts in AI data solutions

Quality

We ensure superior data quality to maximize your AI project's potential

Efficiency

Our optimized workflows accelerate your model training processes

Proven Results

Our track record of case studies demonstrates our ability to deliver outstanding outcomes

Customization

Our track record of case studies demonstrates our ability to deliver outstanding outcomes

Support

We provide ongoing support and consultation to ensure continuous success
background
team
1000 +
full-time assessors

Ready to get started?

Tell us what you need — we’ll reply within 24h with a free estimate

    What service are you looking for? *
    What service are you looking for?
    Data Labeling
    Data Collection
    Ready-made Datasets
    Human Moderation
    Medicine
    Other (please describe below)
    What's your budget range? *
    What's your budget range?
    < $1,000
    $1,000 – $5,000
    $5,000 – $10,000
    $10,000 – $50,000
    $50,000+
    Not sure yet
    Where did you hear about Unidata? *
    Where did you hear about Unidata?
    Head of Client Success
    Andrew
    Head of Client Success

    — I'll guide you through every step, from your first
    message to full project delivery

    Thank you for your
    message

    It has been successfully sent!

    We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.