CASIA v1.0

Image splicing is one of the most popular techniques used in digital photomontage. This technique aims to crop and paste regions from an image to the same or another image. Splicing is also considered as a basic and popular operation for image tempering. With powerful digital image editor such as Adobe Photoshop, almost anyone is possible to make a perfect tampered image. Examples of abuse of these tampered images in newspapers are reported in several cases and have raised the anxiety of justice among people. With a goal of verifying image content authenticity, passive-blind image tampering detection was called for and an open benchmark dataset also is needed. In this report, we describe a natural color image tampering detection evaluation dataset with realistic splicing operations in order to provide a common platform for the researchers to compare and evaluate the performance of different proposed algorithms.

Our dataset contains 800 authentic and 925 spliced color images of size 384x256 pixels with JPEG format. The authentic images were mostly collected from the Corel image dataset and others are taken by our own digital cameras. We further divided the authentic images into several categories (scene, animal, architecture, character, plant, article, nature and texture) according to image content and also consider some criteria based on the information of categories when making spliced images.

Brief Descriptions on Design Principles

All tampered images in this database are made only by splicing operation. Spliced images are generated from authentic images by crop-and-paste operation using Adobe Photoshop CS3 version 10.0.1 on Windows XP. Spliced region(s) are either from the same authentic image or from another image. The following points are emphasized while generating the dataset.

Content diversity: all authentic images in this dataset are natural images and categorized into 8 categories (scene, animal, architecture, character, plant, article, nature and texture). Spliced images makers are told to randomly use candidate images from all categories to generate spliced images.

Realistic operation: we simulate the process of splicing with the following different ways:

  1. Randomly crop-and-paste image region(s) of different shapes (circle, triangle, rectangle and arbitrary boundaries).
  2. Cropped image region(s) can be processed with resizing, rotation or other distortion then be pasted to generate a spliced image.
  3. Difference sizes (small, medium and large) of spliced regions are concerned.
  4. Most generated spliced images are considered to be realistic images judged by human eyes.

The Structure and Image Format

The dataset consists of 1725 image of size 374x256 pixels. There are two main subsets which are authentic set and spliced set, authentic set contains 800 authentic color images and spliced set contains 921 spliced color images. We named our images in both sets based on the following principles.

Image Set_Filename . File Format

Image Set: 2 letters

File type: 3 letters

Filename structure:

1) We denote our authentic set filename with the following format:

Filename: (Content Category) _ (File Index)

Content Category: 3 letters

File Index: 4 letters

(e.g.) a filename "Au_ani_0021.jpg" is referring the indexed "0021" image with JPEG format from the animal category of authentic set.

2) We denote our spliced set filename with the following format:

Filename: (Tampered Region Type)_(Operation Type)_(Source Image Index1)_(Source Image Index2)_(File Index)

Tampered Region Type: 1 letter

Operation Type: 3 letters

We denote the operation taken to the splicing region before pasting to a final generation as following:

R - Resize;    D - Deform;    C - Rotate;    N - Do nothing, keep the original;

Spliced Region Shape: 1 letter

We denote the spliced region shape as following:

(Source Image Index1)_(Source Image Index2): 14 letters

The two image index is according to the candidate image(s) used for generating the final spliced image. If spliced region(s) are from one identical image, the two filename parts are same, otherwise is different.

File Index: 4 letters

(e.g.) "Sp_04_CNN_A_nat0071_ani0024_0270.jpg" is showing a spliced JPEG image (no.0270) that generated from cutting an arbitrary region (contour of a buck) from an interested region of an animal authentic image (no. 0024) then paste to a nature authentic image (no.0071) with rotation. (See Figure. 1 b)

After naming the spliced image as above principle, people can trace back the spicing operation and find the ground truth for splicing detection.

Examples

Procedure of generating spliced images:

(a)
(b)

Examples of dataset

Authentic Set
Sec
Ani
Arc
Cha
Art
Pla
Nat
Txt
Tampered Set
Simple Splicing
Complex Splicing

Contacts

Ms. Jing Dong

National Lab of Pattern Recognition

Institute of Automation, Chinese Academy of Sciences

P.O.Box 2728

Beijing 100190, China

jdong@nlpr.ia.ac.cn

Mr. Wei Wang

National Lab of Pattern Recognition

Institute of Automation, Chinese Academy of Sciences

P.O.Box 2728

Beijing 100190, China

wwang@nlpr.ia.ac.cn

Dataset Download

Please fill out the download request form here to download CASIA v1.0 image dataset.

Citation

For any work that makes use of the dataset, please include the following citation in the acknowledgements section or the footnote of any research publications.

"Credits for the use of the CASIA Image Tempering Detection Evaluation Database (CAISA TIDE) V1.0 are given to the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Corel Image Database and the photographers. http://forensics.idealtest.org"

Acknowledgment

We appreciate Corel image database and all the photographers who contributed their own pictures for the usage of generating this dataset. Also, we appreciate very much for the great efforts of the following people who create the tampered image dataset manually. They are: Peng Gao, Hui Zhang, Ruier Guo, Jingli Liu, Lihu Ma, Jin Zhang and Qian He.