African American Vernacular Call-Center dataset

Project Overview:

Objective

The primary objective of the “African American Vernacular Call-Center Dataset” project is to curate a dataset that facilitates the development and enhancement of voice recognition models. Specifically, our aim is to create a robust resource for training models to recognize and understand African American Vernacular English (AAVE) in call-center interactions. This dataset holds great potential for improving the accuracy and inclusivity of voice-based customer service systems.

Scope

Our project scope encompasses both data collection and annotation phases. We collect voice recordings of call-center interactions that feature AAVE speakers and annotate them to provide valuable insights into the language patterns and communication styles within this context.

African American Vernacular Call-Center dataset
African American Vernacular Call-Center dataset
African American Vernacular Call-Center dataset
African American Vernacular Call-Center dataset

Sources

  • Call-Center Interactions: We gather recordings from real call-center interactions, ensuring that our dataset accurately represents the language used in these scenarios.
  • Volunteers: We collaborate with volunteers who have experience speaking in AAVE to provide additional voice recordings, contributing to the dataset’s richness.
  • Public Domain Datasets: Accessing publicly available datasets with AAVE speech samples helps us augment our collection.
  • Voice Actors: In some cases, we work with skilled voice actors to create controlled call-center dialogues featuring AAVE.
case study-post
African American Vernacular Call-Center dataset
African American Vernacular Call-Center dataset

Data Collection Metrics

  • Total Call-Center Interaction Recordings: 20,000 recordings
  • Volunteers’ Contributions: 5,000
  • Public Domain Datasets: 7,000
  • Voice Actors’ Contributions: 8,000

Annotation Process

Stages

  1. Transcription and Language Labeling: Each call-center interaction recording is transcribed, and language elements specific to AAVE are identified and labeled.
  2. Quality Assessment: We conduct a quality assessment to ensure accurate annotations and to verify the authenticity of the AAVE language elements.

Annotation Metrics

  • Call-Center Interaction Recordings with AAVE Annotations: 20,000
  • Quality Assessment: 15,000
African American Vernacular Call-Center dataset
African American Vernacular Call-Center dataset
African American Vernacular Call-Center dataset
African American Vernacular Call-Center dataset

Quality Assurance

Stages

Annotation Verification: We implement a rigorous validation process, involving experts in AAVE linguistics, to review and verify the accuracy of our annotations.
Data Quality Control: To maintain data quality, we meticulously remove any low-quality or noisy recordings from the dataset.
Data Security: We adhere to strict privacy regulations, obtain necessary user consent, and employ robust security measures to protect sensitive voice data.

QA Metrics

  • Annotation Validation Cases: 3,000
  • Data Cleansing: Removal of low-quality or irrelevant recordings

Conclusion

The “African American Vernacular Call-Center Dataset” serves as a vital resource for advancing voice recognition technology in the context of call-center interactions. With accurately annotated call-center dialogues and a commitment to data quality, this dataset empowers the development of more inclusive and effective customer service systems. It contributes to bridging language gaps and ensuring that voice recognition technology caters to diverse linguistic backgrounds, ultimately enhancing the customer experience in call centers and beyond.

Technology

Quality Data Creation

Technology

Guaranteed TAT

Technology

ISO 9001:2015, ISO/IEC 27001:2013 Certified

Technology

HIPAA Compliance

Technology

GDPR Compliance

Technology

Compliance and Security

Let's Discuss your Data collection Requirement With Us

To get a detailed estimation of requirements please reach us.

Scroll to Top