New York English Call-Center Dataset

Project Overview:

Objective

Our project, “New York English Call-Center Dataset”, is designed to enhance the capabilities of machine learning models in understanding and processing English language conversations in call-center environments. This dataset is tailored for applications in customer service automation, voice recognition, and sentiment analysis.

Scope

We focus on gathering and annotating a rich collection of call-center conversation recordings. These recordings are sourced from diverse call centers across New York, ensuring a wide range of dialects, speaking styles, and conversation types.

New York English Call-Center Dataset
New York English Call-Center Dataset
New York English Call-Center Dataset
New York English Call-Center Dataset

Sources

  • The collection involved call-center audio recordings from various industries such as banking, retail, telecommunications, and healthcare, with a specific focus on interactions in the New York region.
  • There was an inclusion of a diverse range of conversation types, spanning customer inquiries, support requests, complaints, and sales calls.
  • We successfully collected a comprehensive set of call-center audio recordings, successfully generating a varied and representative sample of communication scenarios in the New York region across different industries and conversation types.
New York English Call-Center Dataset
New York English Call-Center Dataset

Data Collection Metrics

  • Total Call-Center Conversations Recorded: 20,000
  • Conversations from Customer Service Centers: 10,000
  • Conversations from Technical Support Centers: 5,000
  • Miscellaneous Conversations: 5,000

Annotation Process

Stages

  1. Conversation Categorization: Classifying conversations based on their nature (e.g., complaint, inquiry, technical support).
  2. Speaker Identification: Annotating speakers as customer or representative and noting any changeovers.
  3. Sentiment Analysis: Tagging segments of the conversation with sentiment labels (positive, negative, neutral).

Annotation Metrics

  • Conversations with Detailed Category Labels: 20,000
  • Speaker Identification Annotations: 20,000
  • Sentiment Analysis Tags: 40,000
New York English Call-Center Dataset
New York English Call-Center Dataset
New York English Call-Center Dataset
New York English Call-Center Dataset

Quality Assurance

Stages

Annotation Verification: Each annotated conversation undergoes a review process by our linguistic experts.
Data Quality Control: Regular audits are conducted to eliminate any recordings that do not meet our quality standards.
Data Security and Privacy Compliance: Adherence to stringent data protection protocols.

QA Metrics

  • Annotation Verification Cases: 3,000
  • Data Cleansing: Continuous quality checks and removal of subpar recordings

Conclusion

Our “New York English Call-Center Dataset” serves as a robust resource for developing advanced machine learning models in customer service and voice processing fields. The diverse, accurately annotated, and quality-assured dataset stands as a testament to our commitment to delivering exceptional data solutions for AI advancements.

quality dataset

Quality Data Creation

Guaranteed TAT​

Guaranteed TAT

ISO 9001:2015, ISO/IEC 27001:2013 Certified​

ISO 9001:2015, ISO/IEC 27001:2013 Certified

HIPAA Compliance​

HIPAA Compliance

GDPR Compliance​

GDPR Compliance

Compliance and Security​

Compliance and Security

Let's Discuss your Data collection Requirement With Us

To get a detailed estimation of requirements please reach us.

Scroll to Top