African American Vernacular Call-Center dataset
Home » Case Study » African American Vernacular Call-Center dataset
Project Overview:
Objective
The primary objective of the “African American Vernacular Call-Center Dataset” project is to curate a dataset that facilitates the development and enhancement of voice recognition models. Specifically, our aim is to create a robust resource for training models to recognize and understand African American Vernacular English (AAVE) in call-center interactions. This dataset holds great potential for improving the accuracy and inclusivity of voice-based customer service systems.
Scope
Our project scope encompasses both data collection and annotation phases. We collect voice recordings of call-center interactions that feature AAVE speakers and annotate them to provide valuable insights into the language patterns and communication styles within this context.
Sources
- Call-Center Interactions: We gather recordings from real call-center interactions, ensuring that our dataset accurately represents the language used in these scenarios.
- Volunteers: We collaborate with volunteers who have experience speaking in AAVE to provide additional voice recordings, contributing to the dataset’s richness.
- Public Domain Datasets: Accessing publicly available datasets with AAVE speech samples helps us augment our collection.
- Voice Actors: In some cases, we work with skilled voice actors to create controlled call-center dialogues featuring AAVE.
Data Collection Metrics
- Total Call-Center Interaction Recordings: 20,000 recordings
- Volunteers’ Contributions: 5,000
- Public Domain Datasets: 7,000
- Voice Actors’ Contributions: 8,000
Annotation Process
Stages
- Transcription and Language Labeling: Each call-center interaction recording is transcribed, and language elements specific to AAVE are identified and labeled.
- Quality Assessment: We conduct a quality assessment to ensure accurate annotations and to verify the authenticity of the AAVE language elements.
Annotation Metrics
- Call-Center Interaction Recordings with AAVE Annotations: 20,000
- Quality Assessment: 15,000
Quality Assurance
Stages
Annotation Verification: We implement a rigorous validation process, involving experts in AAVE linguistics, to review and verify the accuracy of our annotations.
Data Quality Control: To maintain data quality, we meticulously remove any low-quality or noisy recordings from the dataset.
Data Security: We adhere to strict privacy regulations, obtain necessary user consent, and employ robust security measures to protect sensitive voice data.
QA Metrics
- Annotation Validation Cases: 3,000
- Data Cleansing: Removal of low-quality or irrelevant recordings
Conclusion
The “African American Vernacular Call-Center Dataset” serves as a vital resource for advancing voice recognition technology in the context of call-center interactions. With accurately annotated call-center dialogues and a commitment to data quality, this dataset empowers the development of more inclusive and effective customer service systems. It contributes to bridging language gaps and ensuring that voice recognition technology caters to diverse linguistic backgrounds, ultimately enhancing the customer experience in call centers and beyond.
Quality Data Creation
Guaranteed TAT
ISO 9001:2015, ISO/IEC 27001:2013 Certified
HIPAA Compliance
GDPR Compliance
Compliance and Security
Let's Discuss your Data collection Requirement With Us
To get a detailed estimation of requirements please reach us.