Phone Conversations in Irish English

Project Overview:


Our project aimed to develop a comprehensive dataset of Irish English phone conversations. This dataset is vital for training and enhancing speech recognition models, making them more adaptable and accurate in understanding the nuances of Irish English dialects.


We undertook the extensive task of collecting and annotating a wide range of phone conversations in Irish English. This effort was directed towards creating a dataset that captures the diverse accents, colloquialisms, and speech patterns unique to Irish English speakers.

  • The collection involved gathering audio recordings of phone conversations across various contexts, including customer service calls, business communications, casual conversations, and emergency services.
  • There was a focus on ensuring a diverse representation of speakers in terms of age, gender, socioeconomic background, and regional accents within Ireland.
  • We successfully collected a comprehensive set of audio recordings, successfully generating a rich and varied representation of phone conversations, encompassing diverse contexts and speaker characteristics in Ireland.
Data Collection Metrics

  • Total Conversations Recorded: 120,000
  • From Urban Areas: 70,000
  • From Rural Regions: 50,000

Annotation Process


  1. Transcription Accuracy: Each conversation was meticulously transcribed to ensure a high level of accuracy, capturing every nuance of the spoken word.
  2. Dialect Identification: Conversations were classified based on regional dialects, enriching the dataset with linguistic diversity.
  3. Contextual Tagging: We tagged conversations with contextual data such as the topic, emotional tone, and speaking pace.

Annotation Metrics

  • Conversations Annotated: 120,000
  • Dialect Categories Identified: 5 major dialects
  • Contextual Tags Applied: 120,000
Quality Assurance


Continuous Evaluation: Regular assessments of the dataset’s effectiveness in training speech recognition models.
Privacy Compliance: Ensured all conversations were anonymized and compliant with privacy standards.
Feedback Integration: Feedback from linguists and speech recognition experts was incorporated to refine the dataset.

QA Metrics

  • Accuracy in Speech Recognition Models: 95%
  • Annotation Consistency Rate: 99%
  • User Satisfaction Rate: 96%


Our Irish English phone conversation dataset has significantly improved the performance of speech recognition models. The models trained with our dataset demonstrate enhanced understanding and accuracy in deciphering Irish English accents and dialects. This advancement is crucial for businesses and services aiming to provide better voice recognition solutions to customers in Ireland.

