Alexa Wake Words in Canadian French (Adults)
Home » Case Study » Alexa Wake Words in Canadian French (Adults)
Project Overview:
Objective
Scope
We undertook the collection of a wide array of audio recordings from native Canadian French speakers. Our diverse data set encompassed various environments and accents, all annotated with precise wake word markers.
Sources
- Voice Assistant Users: We collaborated with Alexa users fluent in Canadian French, gathering audio clips of them using the wake word in different contexts.
- Voice Actors: We employed professional voice actors proficient in Canadian French to generate synthetic wake word recordings, thereby enriching the dataset’s diversity.
- Public Domain Recordings: We sourced publicly available audio containing the “Alexa” wake word in Canadian French.
Data Collection Metrics
- Total Audio Clips Collected: 30,000
- User Contributions: 15,000
- Voice Actor Recordings: 10,000
- Public Domain Extracts: 5,000
- Total Audio Clips Annotated: 30,000
Annotation Process
Stages
- Wake Word Annotation: We meticulously marked the temporal boundaries of the “Alexa” wake word in each audio clip.
- Speaker Demographics: We gathered metadata about the contributors, including age, accent, and gender.
- Recording Conditions: We documented the recording conditions such as ambient noise levels and the devices used.
Annotation Metrics
- Audio Clips with Wake Word Annotations: 30,000
- Speaker Demographics: 30,000
- Recording Condition Metadata: 30,000
Quality Assurance
Stages
Annotation Verification: We implemented a rigorous validation process using both automated tools and human reviewers to ensure the accuracy of wake word annotations.
User Consent: We ensured that all user-contributed audio clips included explicit consent for usage in the dataset, with all personally identifiable information anonymized.
Privacy Compliance: We adhered strictly to privacy regulations, including data protection policies, and provided mechanisms for contributors to opt-out or request data removal.
QA Metrics
- Annotation Validation Cases: 3,000 (10% of total)
- Privacy Audits: 15,000 (for user-contributed data)
Conclusion
This dataset, the Alexa Wake Words Dataset in Canadian French (Adults), demonstrates our capability in collecting and annotating high-quality speech data. It stands as a valuable resource for enhancing voice recognition and natural language processing technologies, specifically catering to the Canadian French-speaking adult demographic. Our commitment to diversity, meticulous annotation, and stringent privacy compliance underscores our dedication to advancing machine learning research and development.
Quality Data Creation
Guaranteed TAT
ISO 9001:2015, ISO/IEC 27001:2013 Certified
HIPAA Compliance
GDPR Compliance
Compliance and Security
Let's Discuss your Data collection Requirement With Us
To get a detailed estimation of requirements please reach us.