CASIA v2.0

Image splicing does not perform any post-processing operation after generating a fake image so that it is considered as the simplest way for image tampering. However, image tempering not only consists of image splicing but also involves some complex operation, especially post-processing operation after splicing. With many image editing tools available now, it is easily manipulated by people to create tampered images. We have released an image splicing database (CASIA TIDEv1.0) for splicing detection evaluation. It can be used by researchers and investigators to find an open benchmark of image authentication. However, in order to improve the development of this area, we then constructed another more challenged image tampering evaluation database (CASIA TIDEv2.0). Compared to database V1.0, V2.0 is with larger size and with more realistic and challenged fake images by using post-processing of across tampered regions.

Our dataset V2.0 contains 7491 authentic and 5123 tampered color images. The images in this database are with difference size, various from 240×160 to 900×600 pixels. Compared to V1.0, which only contains one image format of JPEG, we added uncompressed image samples and also considered JPEG images with different Q factors. The authentic images are fairly collected from the Corel image dataset, websites with authorization of the image usage and our own images captured from digital cameras. We similarly divided all the authentic images into several categories as we did in V1.0, but we added an extra "indoor" category to consider the impact of image illumination. Moreover, we consider post-processing of the boundaries of spliced region(s) in this database.

Brief Descriptions on Design Principles

All tampered images are generated using Adobe Photoshop CS3 version 10.0.1 on Windows XP. Tampered region(s) are either from the same authentic image or from another image. The following points are emphasized while generating the dataset.

Content diversity: all authentic images in this dataset are natural images and categorized into 9 categories (scene, animal, architecture, character, plant, article, nature indoor and texture). Tampered images makers are told to randomly use candidate images from all categories to generate spliced images.

Realistic operation: we simulate the process of tampering with the following different ways:

  1. Randomly crop-and-paste image region(s);.
  2. Cropped image region(s) can be processed with resizing, rotation or other distortion then be pasted to generate a spliced image;
  3. Consider the post-processing (such as blurring) after crop-and-past operation to finish the fake image generation;
  4. Difference sizes (small, medium and large) of spliced regions are concerned.
  5. Most generated spliced images are considered to be realistic images judged by human eyes.

The Structure and Image Format

The dataset consists of 12,323 color images with difference image size from 240×160 to 900×600 pixels. There are two main subsets which are authentic set with 7200 images and tampered set with 5123 images. We named our images based on the following principles.

Image Set_Filename . File Format

Image Set: 2 letters

File type: 3 letters

Filename structure:

1) We denote our authentic set filename with the following format:

Filename: (Content Category) _ (File Index)

Content Category: 3 letters

File Index: 5 letters

(e.g.) a filename "Au_cha_00076.jpg" is referring the indexed "00076" image with jpg format from the character category of authentic set.

2) We denote our spliced set filename with the following format:

Filename: (Operation Type)_(Image source)_(Tampered Region Size)_(Post-processing)_(Source Image Index1)_(Source Image Index2)_(File Index)

Operation Type: 3 letters

We denote the operation taken to the splicing region before pasting to a final generation as following:

R - Resize;    D - Deform;    C - Rotate;    N - Do nothing, keep the original;

Image source: 1 letter

Tampered Region Size: 1 letter

Post-processing: 1 letter

(Source Image Index1)_(Source Image Index2): 16 letters

The two image index is according to the candidate image(s) used for generating the final tampered image. If tampered region(s) are from one identical image, the two filename parts are same, otherwise is different.

File Index: 5 letters

(e.g.)"Tp_D_NRN_S_N_sec00100_ani00005_00708.tif" is showing a tampered TIFF image no.00708 that generated from copy a contour of a bird from an animal authentic set (no. 00005) then resized and paste to a authentic image in scene category (no.10139) without rotation and deformation without blurring operation. (See Figure 1 b)

After naming the spliced image as above principle, people can trace back the tampering operation and find the ground truth for detection.


Procedure of generating tampered images:


Examples of dataset

Authentic Set
Tampered Set


Ms. Jing Dong

National Lab of Pattern Recognition

Institute of Automation, Chinese Academy of Sciences

P.O.Box 2728

Beijing 100190, China

Mr. Wei Wang

National Lab of Pattern Recognition

Institute of Automation, Chinese Academy of Sciences

P.O.Box 2728

Beijing 100190, China

Dataset Download


For any work that makes use of the dataset, please include the following citation in the acknowledgements section or the footnote of any research publications.

"Credits for the use of the CASIA Image Tempering Detection Evaluation Database (CAISA TIDE) V2.0 are given to the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Science, Corel Image Database and the photographers."


We appreciate Corel image database and all the photographers who contributed their own pictures for the usage of generating this dataset. Also, we appreciate very much for the great efforts of the following people who create the tampered image dataset manually. They are: Peng Gao, Hui Zhang, Ruier Guo, Jingli Liu, Lihu Ma, Jin Zhang and Qian He.