Alexa Wake Words in Canadian French (Youth)

Alexa Wake Words in Canadian French (Youth)

Project Overview

Objective

As a leading data collection and annotation company, we successfully built an extensive dataset of audio clips featuring the “Alexa” wake word, as articulated in Canadian French by youth. This dataset now plays a pivotal role in advancing wake word detection systems and voice assistants targeting the Canadian French-speaking youth demographic.

Scope

Our team gathered a comprehensive and varied collection of audio recordings from Canadian French-speaking youth, covering a range of environments and accents. We meticulously annotated these recordings with accurate wake word timestamps, ensuring high utility for voice recognition technologies.

  • img4
  • img4
  • img4
  • img4

Sources

  • Youth Contributors: We partnered with young individuals eager to contribute audio clips, capturing the “Alexa” wake word in diverse Canadian French contexts.
  • Youth Voice Actors: We engaged young, fluent Canadian French voice actors to generate synthetic wake word recordings, adding breadth to the dataset.
  • Public Domain Recordings: We leveraged available public domain audio that contained the “Alexa” wake word in Canadian French.
img4
  • img4
  • img4

Data Collection Metrics

  • Total Audio Clips Collected: 25,000
  • Youth Contributors’ Recordings: 15,000 clips
  • Youth Voice Actor Recordings: 7,500 clips
  • Public Domain Extracts: 2,500 clips

Annotation Process

Stages

  1. Wake Word Annotation: Our team precisely identified the start and end of the “Alexa” wake word in each audio clip.
  2. Contributor Demographics: We gathered extensive metadata on our youth contributors, including age, accent, and gender.
  3. Recording Conditions: We documented varied recording conditions like ambient noise levels and the types of recording devices used.

Annotation Metrics

  • Audio Clips with Wake Word Annotations: 25,000
  • Contributor Demographics: 25,000
  • Recording Condition Metadata: 25,000
  • img4
  • img4
  • img4
  • img4

Quality Assurance

Annotation Verification: We employed automated tools and youth reviewers for a thorough validation process, ensuring the accuracy of wake word annotations.
Youth Consent and Parental Consent: We ensured all youth-contributed audio clips had explicit consent for use, with parental consent obtained where necessary. All personally identifiable information was anonymized.
Privacy Compliance: Our approach adhered strictly to privacy regulations, including data protection policies. We also provided options for youth contributors or their guardians to opt out or request data removal.

QA Metrics:

  • Annotation Validation Cases: 2,500 (10% of total)
  • Privacy Audits: 15,000 (for youth-contributed data)

Conclusion

Our Alexa Wake Words Dataset in Canadian French (Youth) significantly enhances wake word detection and voice assistant systems for the Canadian French-speaking youth demographic. This project, characterized by its diverse youth recordings, detailed annotations, and stringent privacy compliance, stands as a testament to our expertise in data collection and annotation for AI and machine learning advancements.

  • icon
    Quality Data Creation
  • icon
    Guaranteed
    TAT
  • icon
    ISO 9001:2015, ISO/IEC 27001:2013 Certified
  • icon
    HIPAA
    Compliance
  • icon
    GDPR
    Compliance
  • icon
    Compliance and Security

Let's Discuss your Data collection
Requirement With Us

To get a detailed estimation of requirements please reach us.

Get a Quote icon