10 Big Cats of the Wild - Image Classification

10 Big Cats of the Wild - Image Classification

Datasets

10 Big Cats of the Wild - Image Classification

File

10 Big Cats of the Wild - Image Classification

Use Case

Computer Vision

Description

Images were gathered from Google searches and downloaded using app 'download all images' . I highly recommend this app as it is very fast and returns a zip file with the images which you can then unzip to a specific directory.

10 Big Cats of the Wild - Image Classification

About Dataset

Images were gathered from Google searches and downloaded using app ‘download all images’ . I highly recommend this app as it is very fast and returns a zip file with the images which you can then unzip to a specific directory. I have developed a custom set of tools to create datasets. The first tool used creates a dataset framework in a specified directory I call Datasets. It inputs the name of the new dataset and creates a directory with that name and within that directory creates 4 subdirectories train, test, valid and storage. The storage directory is where the unzipped downloaded images are placed. Downloaded images can be a crazy mix of ungodly file names and image formats. I wrote a python program called order_by_size. It operates on the downloaded images, within the storage directory, It removes files with extensions that are not jpg, png, or bmp and deletes files that are below a user specified image size. Then it renames the files sequentially using “zeros” padding and converts them to jpg format, and orders the files so that the first file is the largest image size, 2nd file is the next largest and so on. For the images in your dataset you want to start with images that are large. Later these images will be cropped to a region of interest and you want these cropped images to be large and have sufficient pixel count so that features can be extracted by your classification model. Now that the files are sequentially ordered and have jpg extensions I use another program called duplicate delete. This program uses file hashing to detect duplicate images and deletes any duplicates. This prevents having images in common between the train, test and validation images when the files are partitioned. Now when you do a Google search you will get a lot of what you want and also a lot of junk. I wrote another python program called review_images that sequentially shows each of the images in the storage directory and you can elect to delete or keep the image if it is the correct type of image you want. This then eliminates unwanted images from the storage directory. Then comes the hard part. If you want to build a high quality dataset you should crop your images so that the resulting image has a high ratio of pixels in the region of interest to the total number of pixels. For that I use paint shop pro version 9. If you examine the dataset images you will see that in most cases the image of the cat takes up at least 50% of the pixels in the image. After all that is done I use the order_by_size program again with different parameters which converts all the images to a specified size. For this dataset I used 224 X 224 X3 as the image size. Now we have a uniform ordered and properly pruned set of images for a specific class like tigers for example. I wrote another python program called make_class, it inputs the new class name (tiger for example) and creates a new class sub directory in the train, test and valid directories. Then it partitions the images in the storage directory into train images, test images and validation images and stores them in the class directory of the train, test and valid directories. Finally I wrote another python program that creates a dataset csv file. To make a high quality dataset takes a lot of work but the tools I have generated helps to reduce the work load.

Contact Us

Please enable JavaScript in your browser to complete this form.
quality dataset

Quality Data Creation

Guaranteed TAT​

Guaranteed TAT

ISO 9001:2015, ISO/IEC 27001:2013 Certified​

ISO 9001:2015, ISO/IEC 27001:2013 Certified

HIPAA Compliance​

HIPAA Compliance

GDPR Compliance​

GDPR Compliance

Compliance and Security​

Compliance and Security

Let's Discuss your Data collection Requirement With Us

To get a detailed estimation of requirements please reach us.

Scroll to Top

Please provide your details to download the Dataset.