English Scenes Text Dataset

Project Overview:


To develop a dataset encompassing images with English text captured from various scenes. The primary intent is to train and improve Optical Character Recognition (OCR) models, AI-based translation tools, and accessibility software.


Compile images featuring English text from diverse settings, such as street signs, storefronts, product labels, book covers, and digital screens. Annotations will pinpoint the text and provide accompanying transcriptions.

English Scenes Text Dataset
English Scenes Text Dataset
English Scenes Text Dataset
English Scenes Text Dataset


  • Tourist attractions and public spaces: Engaged in collaborations to create a carefully collected and successfully curated portfolio of popular destinations.
  • Collaborations with retailers for product packaging: Established partnerships resulting in a thoughtfully collected and successfully curated assortment of product packaging materials.
  • Library archives and bookstores: Utilized materials from library archives and bookstores, contributing to a meticulously collected and diverse range of literary resources.
  • User-submitted photographs: Accepted and thoughtfully curated a collection of user-submitted photographs for a comprehensive visual representation.
  • Screenshots from digital devices: Ethically collected and successfully curated screenshots from digital devices, ensuring a well-organized and comprehensive dataset.
English Scenes Text Dataset
English Scenes Text Dataset

Data Collection Metrics

  • Total Images with Text: 450,000
  • Street Signs: 100,000
  • Storefronts and Billboards: 90,000
  • Product Labels: 80,000
  • Book Covers: 70,000
  • Digital Screens: 60,000
  • Miscellaneous Sources: 50,000

Annotation Process


  1. Image Pre-processing: Adjusting for optimal clarity and brightness.
  2. Text Region Segmentation: Outlining the exact region where the text appears.
  3. Text Transcription: Transcribing the English text as it appears in the scene.
  4. Validation: Cross-checking annotations with a combination of human reviewers and preliminary OCR models.

Annotation Metrics

  • Total Text Region Annotations: 450,000
  • Transcriptions Provided: 450,000
English Scenes Text Dataset
English Scenes Text Dataset
English Scenes Text Dataset
English Scenes Text Dataset

Quality Assurance


Automated OCR Verification: Early-stage OCR models validate the segmented text and transcriptions for consistency.
Peer Review: Annotations undergo a second pass by an alternate group of annotators.
Inter-annotator Agreement: A selection of images is examined by multiple annotators to ensure high consistency.

QA Metrics

  • Annotations Validated using OCR: 225,000 (50% of total images)
  • Peer Reviewed Annotations: 135,000 (30% of total images)
  • Inconsistencies Identified and Rectified: 9,000 (2% of total images)


The English Scenes Text Dataset serves as a robust foundation for the development and refinement of OCR models and other text-recognition software. By representing a wide range of real-world scenes, the dataset ensures that these models can accurately detect and interpret English text in various contexts. This dataset is pivotal for innovations in language translation, augmented reality, and assistive technologies.

quality dataset

Quality Data Creation

Guaranteed TAT‚Äč

Guaranteed TAT

ISO 9001:2015, ISO/IEC 27001:2013 Certified‚Äč

ISO 9001:2015, ISO/IEC 27001:2013 Certified

HIPAA Compliance‚Äč

HIPAA Compliance

GDPR Compliance‚Äč

GDPR Compliance

Compliance and Security‚Äč

Compliance and Security

Let's Discuss your Data collection Requirement With Us

To get a detailed estimation of requirements please reach us.

Scroll to Top