Commercial

Mammography Segmentation Dataset

It is a labeled breast X-ray dataset containing over 3,000 digital mammography images in DICOM format with pixel-level annotations for 14+ breast pathologies, designed for cancer detection, lesion segmentation, and training deep learning models in medical imaging and early diagnosis of breast cancer.

Get in touch Download sample
  • studies
    3,000+
  • pathologies
    14+
Example of a study
  • Medicine
  • Segmentation
  • Computer Vision
  • Classification
  • Machine Learning

It is a labeled breast X-ray dataset containing over 3,000 digital mammography images in DICOM format with pixel-level annotations for 14+ breast pathologies, designed for cancer detection, lesion segmentation, and training deep learning models in medical imaging and early diagnosis of breast cancer.

Get in touch Download sample
  • Medicine
  • Segmentation
  • Computer Vision
  • Classification
  • Machine Learning
  • studies
    3,000+
  • pathologies
    14+

Dataset Info

Characteristic Data
Description X-ray of the mammary to recognize pathologies
Data types DiCOM
Markup Segmentation of pathologies
Tasks Pathology recognition, computer vision.
Number of studies 3,000+
Number of pathologies 14+
Labeling segmentation of a pathology
List of annotated classes Skin thickening, papilloma, malignant lesion, benign lesion, malignant calcification cluster, calcified vessel, artifact, calcified cyst, lymph node, fibrocystic changes, single calcification, nipple, retracted nipple, others.
Example of a study
Example of a study
Download sample

Pathologies

  • Skin thickening
  • papilloma
  • malignant lesion
  • benign lesion
  • malignant calcification cluste
  • calcified vessel
  • artifact
  • calcified cyst
  • lymph node
  • fibrocystic changes
  • single calcification
  • nipple
  • retracted nipple
  • others

Technical
Characteristics

Characteristic Data
File extension DiCOM
Extension of labeling file json
Source and collection methodology. Data was collected by a partner of Unidata.

Dataset Use Cases

  • Healthcare & Medical Imaging

    Early Detection of Breast Cancer

    Mammography Segmentation Dataset supports the development of advanced tools for the early detection of breast cancer through digital mammography. With detailed lesion segmentation and mammogram images, this dataset enables accurate identification of malignant lesions and benign tumors. It helps radiologists and data scientists train deep learning models that improve computer-aided detection systems, enhancing early diagnosis and reducing false negatives in mammography screenings.

  • Artificial Intelligence & Machine Learning

    Training Data for Segmentation Algorithms

    This breast X-ray dataset provides rich training data for building and validating segmentation algorithms used in medical imaging. It includes diverse DICOM images and breast lesion annotations, allowing researchers to experiment with semantic segmentation and machine learning techniques for segmenting breast tissue. The dataset strengthens neural network models and aids in fine-tuning segmentation methodology for improved tumor detection and accurate diagnosis across mammographic datasets.

  • Clinical Research & Diagnostics

    Enhancing Computer-Aided Cancer Diagnosis

    In clinical research, Mammography Segmentation Dataset helps improve cancer detection workflows by providing high-quality digital mammograms with verified pathology information. Researchers can analyze segmentation performance, assess object detection accuracy, and test new feature extraction techniques for malignant cases. The dataset enables the evaluation of segmentation techniques in clinical practices, contributing to better cancer screening and more reliable diagnostic outcomes.

  • Biomedical Data Science & Academic Research

    Benchmarking Models for Breast Lesion Segmentation

    This mammary dataset serves as a benchmark for academic and industrial research on segmentation tasks involving breast tumors and breast lesions. It complements public datasets like CBIS-DSM datasets, offering consistent ground-truth annotations for evaluating deep learning and machine learning models. The dataset’s structured format in DICOM formats and inclusion of patient information make it ideal for exploring segmentation processes, comparing models performed, and advancing the field of medical image analysis for early-stage cancer detection.

FAQs

What are the sources of data for Unidata datasets?
All Unidata datasets are derived from legally permissible and ethically sourced medical imaging data collected in partnership with certified healthcare providers. Data is anonymized to remove any patient-identifiable information before annotation and segmentation.
What is included in this dataset?
The dataset includes over 3,000 DiCOM mammogram studies with 14+ segmented pathology types, including malignant lesions, benign calcifications, lymph nodes, and skin thickening. Each study includes both mammography images and corresponding JSON annotation files for segmentation masks.
What types of annotations are provided?
Annotations include detailed segmentation masks identifying multiple breast pathology types. Each image is labeled with regions such as malignant lesions, papillomas, calcified cysts, and retracted nipples, enabling precise training for machine learning and computer-aided detection models.
Can I request a sample of the dataset before purchasing or downloading it?
Yes. Unidata provides free dataset samples so users can evaluate image quality, segmentation accuracy, and annotation consistency. The sample includes a small set of breast X-ray images with labeled pathologies to demonstrate the structure and usability of the full dataset.
How are Unidata datasets licensed?
Unidata datasets follow a dual-licensing model: free samples are offered for evaluation and testing, while full datasets are available exclusively through purchase. This ensures that researchers can validate dataset quality and compatibility before obtaining full access.
Do Unidata datasets follow GDPR or other data privacy regulations?
Yes. All Unidata datasets are fully compliant with GDPR and applicable international data protection laws. Data is collected, anonymized, and processed from lawful sources, ensuring that no patient-identifiable or sensitive information is included in any medical imaging files.
How are Unidata datasets stored?
All datasets are securely stored on AWS cloud infrastructure, offering high reliability, availability, and data protection. Unidata adheres to ISO 27001 and ISO 27701 standards, maintaining strict information security and privacy management for medical imaging datasets such as this one.
How was the data collected?
All mammography images were collected by a Unidata partner using standard digital mammography procedures. The data was anonymized, curated, and segmented by medical imaging specialists to ensure diagnostic accuracy and high-quality labels suitable for deep learning research.
Still have questions about using Unidata datasets? Read our user-guides

Similar Datasets

What our clients are saying

UniData

4 3 Reviews

PA

Paul 2025-02-21

Very Positive Experience!

The team was very responsive when requesting a specific dataset, and was able to work with us on what data we specifically needed and custom pricing for our use case. Overall a great experience, and would recommend them to others!

TH

Thorsten 2025-01-09

Very good experience

We got in touch with UniData to buy several datasets from them. Communication was very cooperative, quick, and friendly. We were able to find contract conditions that suited both parties well. I also appreciate the team's dedication to understand and address the needs of the customer. And the datasets we bought from UniData matched with our expectations.

Max Crous 2024-10-08

Data purchase

Our team got in touch with UniData for purchasing video data. The team at UniData was transparent, timely, and pleasant to communicate and negotiate with. Their samples and descriptions aligned well with the data we received. We will certainly reach out to UniData again if we're in search of 3rd party video data.

Abhijeet Zilpelwar 2025-02-26

Data is well organized and easy to…

Data is well organized and easy to consume. We could download and use it for training within few hours of receiving the data links.

Why Choose Us

Unidata offers unparalleled expertise in AI data solutions, delivering superior data quality and optimized workflows

Expertise

Our team consists of industry-leading experts in AI data solutions

Quality

We ensure superior data quality to maximize your AI project's potential

Efficiency

Our optimized workflows accelerate your model training processes

Proven Results

Our track record of case studies demonstrates our ability to deliver outstanding outcomes

Customization

Our track record of case studies demonstrates our ability to deliver outstanding outcomes

Support

We provide ongoing support and consultation to ensure continuous success
background
team
1000 +
full-time assessors

Ready to get started?

Tell us what you need — we’ll reply within 24h with a free estimate

    What service are you looking for? *
    What service are you looking for?
    Data Labeling
    Data Collection
    Ready-made Datasets
    Human Moderation
    Medicine
    Other (please describe below)
    What's your budget range? *
    What's your budget range?
    < $1,000
    $1,000 – $5,000
    $5,000 – $10,000
    $10,000 – $50,000
    $50,000+
    Not sure yet
    Where did you hear about Unidata? *
    Where did you hear about Unidata?
    Head of Client Success
    Andrew
    Head of Client Success

    — I'll guide you through every step, from your first
    message to full project delivery

    Thank you for your
    message

    It has been successfully sent!

    We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.