Our goal was to compile a comprehensive dataset of Chinese handwritten compositions, targeting a significant leap forward in Optical Character Recognition (OCR) technologies for Chinese scripts. This dataset is also a boon for educators, offering automated tools for grading and analyzing student compositions.
Scope
We embarked on gathering a wide array of handwritten essays and compositions, covering various themes and writing styles. Accompanying each piece were key metadata elements such as grade level, writing style, and a digital text version.
Sources
Collaborations with schools across different provinces in China.
Public essay competitions emphasizing handwritten submissions.
Archival compositions from educational institutions.
Crowd-sourced contributions through online platforms.
Data Collection Metrics
Total Handwritten Compositions Collected: 275,000
Primary School Submissions: 80,000
Middle School Essays: 90,000
High School Compositions: 60,000
University and Adult Contributions: 25,000
Annotation Process
Stages
Image Pre-processing for Enhanced Legibility
Accurate Digital Transcription of Handwritten Content
Detailed Metadata Annotation
Annotation Metrics
Total Digital Transcriptions Completed: 275,000
Metadata Annotations: 825,000 (Three per composition)
Quality Assurance
Automated OCR Checks
Rigorous Peer Review Process
High Standards of Inter-annotator Agreement
QA Metrics:
OCR Validated Annotations: 137,500
Peer Reviewed Annotations: 82,500
Identified and Rectified Inconsistencies: 5,500
Conclusion
The Chinese Handwritten Composition Dataset offers an invaluable reservoir of native script that mirrors the intricacies and variations of handwriting across different age groups and education levels. By integrating this dataset, OCR technologies can achieve higher accuracy rates when deciphering Chinese handwriting. Furthermore, educational tools can benefit immensely, allowing for innovative solutions in automated grading, handwriting analysis, and educational feedback.
Quality Data Creation
Guaranteed TAT
ISO 9001:2015, ISO/IEC 27001:2013 Certified
HIPAA Compliance
GDPR Compliance
Compliance and Security
Let's Discuss your Data collection Requirement With Us
To get a detailed estimation of requirements please reach us.