Mute Motion: WLASL Media Pipe Encoded

Home » Dataset Download » Mute Motion: WLASL Media Pipe Encoded

Mute Motion: WLASL Media Pipe Encoded

Datasets

File

Mute Motion: WLASL Media Pipe Encoded

Use Case

Computer Vision

Description

An improved encoding method that restarts the detection process for each video independently. This ensures accuracy by avoiding connections to previous videos.

About Dataset

Context

This dataset is derived from the “MuteMotion: WLASL Translation Model” notebook and utilizes the WLASL Datasets accessible on Kaggle. It’s designed to encode videos into landmarks using MediaPipe technology, drawing from two primary sources:

wlasl2000-resized: Used as a backup for missing videos, though the quality was somewhat lower.

Content

This dataset includes:

WLASL_parsed_data.json:

A JSON file containing data from WLASL parsed into a list of dictionaries. Each dictionary represents a single example with those details:
gloss: the word being expressed
video_path: the path to the video in the datasets
frame_start: the frame number where the word starts
frame_end: the frame number where each word ends

filtered_labels.txt:

We used the FastText library to filter labels unrelated to our project.

labels. npz:

All the labels are encoded to vectors using the FastText library.

landmarks_V1.npz:

Videos encoded into NumPy arrays representing landmark coordinates. Each example’s dimensions are (f, 180, 3), where f represents the frame count, 180 denotes chosen landmarks, and 3 signifies the x, y, and z coordinates for each point.
Pose: 6 landmarks for the upper body excluding the face, as we have a dedicated process for it.
filtered_pose = [11, 12, 13, 14, 15, 16]

Face: out of the 478 landmarks, we’ll choose 132, focusing on the lips, eyes, eyebrows, and the outline of the face.

filtered_face = [0, 4, 7, 8, 10, 13, 14, 17, 21, 33, 37, 39, 40, 46, 52, 53, 54, 55, 58,
61, 63, 65, 66, 67, 70, 78, 80, 81, 82, 84, 87, 88, 91, 93, 95, 103, 105,
107, 109, 127, 132, 133, 136, 144, 145, 146, 148, 149, 150, 152, 153, 154,
155, 157, 158, 159, 160, 161, 162, 163, 172, 173, 176, 178, 181, 185, 191,
234, 246, 249, 251, 263, 267, 269, 270, 276, 282, 283, 284, 285, 288, 291,
293, 295, 296, 297, 300, 308, 310, 311, 312, 314, 317, 318, 321, 323, 324,
332, 334, 336, 338, 356, 361, 362, 365, 373, 374, 375, 377, 378, 379, 380,
381, 382, 384, 385, 386, 387, 388, 389, 390, 397, 398, 400, 402, 405, 409,
415, 454, 466, 468, 473]

landmarks_V2.npz:

An improved encoding method that restarts the detection process for each video independently.

landmarks_V3.npz:

This version encodes all 553 landmarks, providing users with the flexibility to filter unwanted landmarks.
The order is [ Right Hand (21), Left Hand (21), Pose (33), Face (478) ].

This dataset is sourced from Kaggle.

Contact Us

Let's Discuss your Data collection Requirement With Us

To get a detailed estimation of requirements please reach us.

Mute Motion: WLASL Media Pipe Encoded