Chinese Handwritten Composition Datasets
Home » Case Study » Chinese Handwritten Composition Datasets
Project Overview:
Objective
Our goal was to compile a comprehensive dataset of Chinese handwritten compositions, targeting a significant leap forward in Optical Character Recognition (OCR) technologies for Chinese scripts. This dataset is also a boon for educators, offering automated tools for grading and analyzing student compositions.
Scope
We embarked on gathering a wide array of handwritten essays and compositions, covering various themes and writing styles. Accompanying each piece were key metadata elements such as grade level, writing style, and a digital text version.
Sources
- Collaborations with schools across different provinces in China.
- Public essay competitions emphasizing handwritten submissions.
- Archival compositions from educational institutions.
- Crowd-sourced contributions through online platforms.
Data Collection Metrics
- Total Handwritten Compositions Collected:Â 275,000
- Primary School Submissions: 80,000
- Middle School Essays: 90,000
- High School Compositions: 60,000
- University and Adult Contributions: 25,000
Annotation Process
Stages
- Image Pre-processing for Enhanced Legibility
- Accurate Digital Transcription of Handwritten Content
- Detailed Metadata Annotation
Annotation Metrics
- Total Digital Transcriptions Completed:Â 275,000
- Metadata Annotations:Â 825,000Â (Three per composition)
Quality Assurance
Stages
Automated OCR Checks
Rigorous Peer Review Process
High Standards of Inter-annotator Agreement
QA Metrics
- OCR Validated Annotations: 137,500
- Peer Reviewed Annotations: 82,500
- Identified and Rectified Inconsistencies: 5,500
Conclusion
The Chinese Handwritten Composition Dataset offers an invaluable reservoir of native script that mirrors the intricacies and variations of handwriting across different age groups and education levels. By integrating this dataset, OCR technologies can achieve higher accuracy rates when deciphering Chinese handwriting. Furthermore, educational tools can benefit immensely, allowing for innovative solutions in automated grading, handwriting analysis, and educational feedback.
Quality Data Creation
Guaranteed TAT
ISO 9001:2015, ISO/IEC 27001:2013 Certified
HIPAA Compliance
GDPR Compliance
Compliance and Security
Let's Discuss your Data collection Requirement With Us
To get a detailed estimation of requirements please reach us.