0

Please can someone describe how to properly obtain the ImageNet dataset (to be precise the ImageNet 2012 Classification Dataset).

What I attempted so far

The ImageNet webpage refers the user to download the ImageNet dataset from Kaggle. However, the Kaggle webpage it refers belongs to the Image Localization (not classification) challenge.

I have also requested a download from the ImageNet webpage which is pending since almost one year.

2 Answers2

0

These are the detailed steps on how I obtained ImageNet and ran a PyTorch example training on it:

        1. Go to https://www.image-net.org/download.php
        2. Request to download ImageNet 
        3. Wait about 5 days for approval, write to them if the waiting period is over.
        4. [I think you can skip this step] Download the Development Kit from the ILSVRC2017 page
        5. Download the images from the ILSVRC2012 page
            a. Training images (Task 1 & 2) 138 GB
            b. Validation images (all tasks) 6.3 GB
            c. Test images (all tasks) 13 GB
        6. [I think you can skip this step if you use the script from step 8!] Unpack the tar files 
            a. mkdir val
            b. tar -C val/ -xvf ILSVRC2012_img_val*.tar 
            c. mkdir test
            d. tar -C test/ -xvf ILSVRC2012_img_test_v10102019.tar 
            e. media train
            f. tar -C train/ -xvf ILSVRC2012_img_train.tar
        7. Confirm the number of images in each folder
            a. ls val/ | wc -l # should give 50,000
            b. ls test/ | wc -l # should give 50,000
        8. Run the script extract_ILSVRC.sh from the PyTorch GitHub [https://github.com/pytorch/examples/blob/main/imagenet/extract_ILSVRC.sh]
        #  imagenet/train/
        #  ├── n01440764
        #  │   ├── n01440764_10026.JPEG
        #  │   ├── n01440764_10027.JPEG
        #  │   ├── ......
        #  ├── ......
        #  imagenet/val/
        #  ├── n01440764
        #  │   ├── ILSVRC2012_val_00000293.JPEG
        #  │   ├── ILSVRC2012_val_00002138.JPEG
        #  │   ├── ......
        #  ├── ......
        9. Run a PyTorch example training on your ImageNet dataset [e.g. from the PyTorch examples GitHub repository https://github.com/pytorch/examples/blob/main/imagenet/main.py]
-1

ImageNet is available in torchvision datasets. https://pytorch.org/vision/stable/generated/torchvision.datasets.ImageNet.html

DKDK
  • 319
  • 1
  • 5