3D in 3D CrossTalk (3DCT) test plan we limited 3D content to stereoscopic content it means we are not considering multi-view displays.
Crosstalk is defined as incomplete isolation of the left and right image channels so that the luminance dedicated to one channel leaks into the other.
Leakage is the raw amount of light which leaks from one channel to another.
System crosstalk is a percentage of the unexpected leaking image from the other eye. It is content independent.
Viewer crosstalk is a crosstalk perceived by the viewer and is a function of the system crosstalk, image contrast and the disparity.
Intended frame rate is defined as the number of video frames per second physically stored for some representation of a video sequence. The intended frame rate may be constant or may change with time. Two examples of constantintended frame rates are a BetacamSP tape containing 25 fps and a VQEG FR-TV Phase I compliant 625-line YUV file containing 25 fps; these both have an absolute frame rate of 25 fps. One example of a variableabsolute frame rate is a computer file containing only new frames; in this case the intended frame rate exactly matches the effective frame rate. The content of video frames is not considered when determining intended frame rate.
Frame rate is the number of (progressive) frames displayed per second (fps).
Refresh rate is defined as the rate at which the computer monitor is updated.
Source frame rate (SFR) is the intendd eframe rate of the original source video sequences. The source frame rate is constant.
Stereoscopic content a sequence with two views separate for each eye.
Three-dimensional (3D) broadcasting and delivery services over networks have become widespread. An adequate assessment method is therefore needed in order to effectively design and optimize 3D services. However, several issues have been pointed out in regard to the conventional assessment method. Furthermore, the investigations have not been sufficient for revising the Recommendation.
Therefore, in this test, several labs will be collaborating to conduct an experiment to explore these issues, and we will prepare technical reports in order to reflect the ITU Recommendation.
There are several issues concerning the subjective methodologies for 3D services, as follows.
・source video sequence (depth, quality, range of contents)
However, there are too many issues to be solved in one subjective test carried out collaboratively by the labs. Therefore, we plan to conduct more than one test according to various conditions, and we have divided the conditions into two categories. One is for the conditions that are common between all labs in which the test is run, and the other is for additional conditions that are introduced in the experiments by individual labs based on their interests.
The procedure is as follows.
STEP 1: Review study items; assign priority (select common study)
STEP 2: Fix the test plan for common conditions; share the test plan for additional conditions
STEP 3: Run the experiment
STEP 4: Analyze the results; prepare a report
STEP 5: Reflect on the Recommendation, if necessary.
2. Independent Lab Group Several organisations volunteered to participate in the 3DTV group: T-Labs, Intel, Yonsei, AGH, NIT, Orange-Labs, FUB, Technicolor, IRCCyN, Acreo, and NTT.
The following labs volunteered to participate in 3DTV test plan. Depending on the equipment available in different labs we can organize different tests. Therefore we have to collect the available equipment.
Current equipment status: Ordered equipment (when?): Future Plans (when, any additional conditions):
shudder glasses system: NVidia PC(Dell Alienware), Panasonic VIERA (TH-P42VT2), Sony BRAVIA (KDL-40LX900)
Ordered equipment (when?): Future Plans (when, any additional conditions):
3. Release of Subjective Data, Objective Data, and the Official Data Analysis
VQEG will publish all MOSes from all video sequences in one year after the test plan final report acceptance. Anyone who use the subjective data should clearly refer to the final report and VQEG.
4. Subjective Rating Tests
Subjective tests will be performed on different stereoscopic displays with resolution: 1920 X 1080. It is display resolution, not view resolution, as some displays are line or column Interleaved. The tests will assess the subjective quality of video material presented in a simulated viewing environment, and will deploy a variety of display technologies.
4.1. Test Design and Common Set
The 3D test designs are not expected to be the same across labs, nevertheless we would like to collect as much information about each subjective experiment as possible, such as:
display width and pixel width,
display mode (actual size or full screen),
display condition (brightness, minimum black level),
field of view, as well as
room condition (illumination level, ambient black level).
Recording at least some part of the experiment is recommended. The following constraints are already know:
Each lab will test the same number of 169 PVSs; this includes the hidden reference and the common set.
The number of SRCs in each test is 9.
The number of HRCs in each test is 16, including the hidden reference. (15 HRCs, 1 Reference)
The test design matrix need not be rectangular (“full factorial”) and will not necessarily be the same across tests.
A common set of 24 video sequences will be included in every experiment. This common set will evenly span the full range of quality described in this test plan (i.e., including the best and worst quality expected). This set of video sequences will include 4 SRC. Each SRC will be paired with 6 HRCs (including the SRC), and each common set HRC may be unique. The common set should include HRCs that are commonly used by the experiments (e.g., typical conditions that avoid unusual codec settings and exotic coder responses). Likewise, the SRC should represent general video sequences and not include unusual or uncommon characteristics.
The instructions given to subjects will request subjects to maintain a specified viewing distance from the display device. The minimum viewing distance is limited by 2 factors. The spatial resolution of the eye which has been agreed as 3H for HDTV displays, where H = Picture Height (picture is defined as the size of the video window, not the physical display). The second limitation is related to avoiding visual fatigue and is a function of the content properties and the display capabilities and needs to be calculated for each display system separately. The maximum and the minimum disparity which needs to be displayed within the comfortable viewing zone of 0.2 diopters has to be determined for each content (this should be listed in the content section). It should be noted that “pop-out” effects with a limited duration may have a disparity outside that range. Using the minimum and the maximum disparity (Dmin, Dmax) and the horizontal inter-pixel distance on the screen (hp), the minimum viewing distances can be calculated as follows:
Two major factors peculiar to stereoscopic display should be taken into consideration, namely the display frame effect and inconsistency between accommodation and convergence.
Stereoscopic pictures appear highly unnatural when objects positioned in front of the screen approach the screen frame.
This unnatural effect is called “the frame effect”. The effect is generally reduced with a larger screen, because observers are less conscious of the existence of the frame when the screen is larger.
The human eye focuses on an object according to the distance to that object. At the same time, we also control the convergence point (gaze point) on the object. Therefore, there is no inconsistency between accommodation and convergence in our everyday life. However when viewing stereoscopic images, the focus point (accommodation) must always be fixed on the screen, independent of the convergence point which is derived from the disparity of the signals.
Otherwise, the observer cannot focus clearly. Thus, an inconsistency between accommodation and convergence is introduced in stereoscopic systems.
It is generally said that the minimum value for depth of field of the human eye is ±0.3 D (Diopter: reciprocal value of distance (m)) [Hiruma and Fukuda, 1990]. This means that we can perceive the image without defocusing when the object is located within ±0.3 D. When viewing stereoscopic television, the accommodation point is fixed on the screen, and therefore stereoscopic pictures should preferably be displayed within this range. Since ordinary television
programmes include images at infinite distance (that is D = 0), the desirable range of depth to be displayed with stereoscopic systems is considered to be within 0 to 0.6 D. Therefore, 0.3 D, i.e. 3.3 m, is considered to be the optimum viewing distance.
Camera parameters (camera separation, camera convergence angle, focal length of lens), resolution of the system and the frame effect should be taken into account in determining viewing conditions (screen size). In the case of HDTV when watching at the standard viewing distance of 3 H (H denotes picture height), the viewing distance of 3.3 m corresponds to a 90-inch screen. In the case of standard definition television (SDTV) when watching at the standard viewing distance of 6 H, this distance corresponds to a 36-inch screen. A subjective assessment of the relationship between screen size and depth perception was carried out with stereoscopic HDTV system, and the results showed that the most natural depth perception was obtained with a screen size of 120 inches, which corresponds to viewing distance of 2.2 H [Yamanoue et al., 1997].
220.127.116.11. Option 2 - The Same Field of View (30 degrees)
Proposal by Liyuan?
4.3.2. Viewing Conditions
18.104.22.168. Common condition 1, 2 or 3 Subjects per Video Display
Preferably, each test subject will have his/her own video display. The test room will conform to ITU-R Rec. BT.500-11 requirements.
It is recommended that subjects be seated facing the center of the video display at the specified viewing distance. That means that subject's eyes are positioned opposite to the video display's center (i.e. if possible, centered both vertically and horizontally). If two or three viewers are run simultaneously using a single display, then the subject’s eyes, if possible, are centered vertically, and viewers should be centered evenly in front of the monitor.
22.214.171.124. Option 1 - 1 Subject per Video Display
Each test subject will have his/her own video display. The test room will conform to ITU-R Rec. BT.500-11 requirements.
It is recommended that subjects be seated facing the center of the video display at the specified viewing distance. That means that subject's eyes are positioned opposite to the video display's center (i.e. if possible, centered both vertically and horizontally).
126.96.36.199. Option 2 - 2 Subjects per Video Display
The typical scenario at home environment is that the couples watching TV side by side in sofa. Therefore the side effect of 3D perception need to be investigated. Each test subject pair will have its own video display. The test room will conform to ITU-R Rec. BT.500-11 requirements.
The subjects’ eyes, are centered vertically, and viewers should be centered evenly in front of the monitor.
188.8.131.52. Option 3 - 3 Subjects per Video Display
The side effect of 3D perception need to be investigated. Each test subject triple will have its own video display. The test room will conform to ITU-R Rec. BT.500-11 requirements.
The subjects’ eyes, are centered vertically, and viewers should be centered evenly in front of the monitor.
4.3.3. Display Specification and Set-up
4.4. Subjective Test Method
The test is going to be composed of two parts: the first part evaluating the impact of crosstalk to the perceived quality and the second part evaluating binary (yes or no) visibility of crosstalk.
4.4.1. Subjective Test Method for Quality
184.108.40.206. Common condition – ACR/ACR-HR
The VQEG HDTV subjective tests will be performed using the Absolute Category Rating Hidden Reference (ACR-HR).
The selected test methodology is the Absolute Rating method – Hidden Reference (ACR-HR) and is derived from the standard Absolute Category Rating – Hidden Reference (ACR-HR) method [ITU-T Recommendation P.910, 1999.] The 5-point ACR scale will be used.
Hidden Reference has been added to the method more recently to address a disadvantage of ACR for use in studies in which objective models must predict the subjective data: If the original video material (SRC) is of poor quality, or if the content is simply unappealing to viewers, such a PVS could be rated low by humans and yet not appear to be degraded to an objective video quality model, especially a full-reference model. In the HR addition to ACR, the original version of each SRC is presented for rating somewhere in the test, without identifying it as the original. Viewers rate the original as they rate any other PVS. The rating score for any PVS is computed as the difference in rating between the processed version and the original of the given SRC. Effects due to esthetic quality of the scene or to original filming quality are “differenced” out of the final PVS subjective ratings.
In the ACR-HR test method, each test condition is presented once for subjective assessment. The test presentation order is randomized according to standard procedures (e.g., Latin or Graeco-Latin square or via computer). Subjective ratings are reported on the five-point scale:
220.127.116.11.1 Assessment questions (Dimensions)
There are several assessment dimensions in 3D systems. Three basic perceptual dimensions are well known: picture quality, depth quality, and visual (dis)comfort. In assessing these dimensionsone issue is whether or not people can judge several qualities simultaneously. Thus, overall quality 3D will generally include all three dimensions. Then we will investigate the factors (dimensions) that comprise overall quality.
a) common condition: overall quality
Compared to 2D quality, depth quality is the most important factor in a 3D system. Consequently, overall quality initially consists of picture quality and depth quality.
b) option 1: picture quality
c) option 2: depth quality
d) option 3: (dis)comfort
If we include the third dimension, we may require a substantially different test procedure (long time viewing etc.).
18.104.22.168. Option 1 - DSCQS
The test method used is Double Stimiulus Continuous Quality Scale (DSCQS).
The same set of video sequences (PVSes) will be presented to the same subjects in different order. Anyway, the question will be binary: “Do you see crosstalk in the video sequence?”. This will allow to correlate crosstalk visibility with the (previously acquired) quality scores.
4.4. Training Session
The purpose of the training session is to make the observers familiar with the viewing of 3D content. In particular, the advantages and typical artifacts of 3D displays should be clearly understood by the observers before the actual session. Thus, the training session contains a comparison between 2D and 3D as well as crosstalk examples with large amounts of crosstalk.
4.4.1. Option - More Explanations
Written explanations and illustrated examples of each level will be presented as it is needed to benchmark and harmonize the measure scale among subjects. In addition, at the beginning of test session, dummy test stimuli will be presented.
4.5. Length of Sessions
The time of actively viewing videos and voting will be limited to 50 minutes per session. Total session time, including instructions, warm-up, voting, pauses, and payment, will be limited to 1.5 hours. The time of 3D sequence viewing is limited to 45 minutes, thus the accumulated time of PVS should be no longer than 45 minutes.
4.6. Subjects and Subjective Test Control
Each test will require exactly 24 subjects. (Note: We need more number, if we need same accuracy of conventional 2D tests.)
The 3D subjective testing will be conducted using dedicated computers or players. Any technology has to provide that (1) playback mechanism is guaranteed to play at frame rate without dropping frames, (2) playback mechanism does not impose any additional distortion (e.g., compression artifacts), and (3) monitor criteria (including synchronisation) are respected.
It is preferred that each subject be given a different randomized order of video sequences where possible. Otherwise, the viewers will be assigned to sub-groups, which will see the test sessions in different randomized orders. At least two different randomized presentations of clips (A & B) will be created for each subjective test. If multiple sessions are conducted (e.g., A1 and A2), then subjects will view the sessions in different orders (e.g., A1-A2, A2-A1). Each lab should have approximately equal numbers of subjects at each randomized presentation and each ordering.
Only non-expert viewers will participate. The term non-expert is used in the sense that the viewers’ work does not involve video picture quality and they are not experienced assessors. They must not have participated in a subjective quality test over a period of six months. All viewers will be screened prior to participation for the following criteria:
normal (20/30) visual acuity with or without corrective glasses (per Snellen test or equivalent) - visual acuity impairments are acceptable as long as they are compensated using corrective glasses.
normal color vision (per Ishihara test or equivalent) - 4 charts should be presented and all 4 should be correctly read,
depth vision test - VT 04 and 07 in BT.1438 is be a choice for stereo acuity test across labs,
familiarity with the language sufficient to comprehend instruction and to provide valid responses using the semantic judgment terms expressed in that language.
The requirements should be addressed when the subjects are asked for the test, in order to avoid unnecessary late rejection of their participation.
4.7. Instructions for Subjects and Failure to Follow Instructions
For many labs, obtaining a reasonably representative sample of subjects is difficult. Therefore, obtaining and retaining a valid data set from each subject is important. The following procedures are highly recommended to ensure valid subjective data:
Write out a set of instructions that the experimenter will read to each test subject. The instructions should clearly explain why the test is being run, what the subject will see, and what the subject should do. Pre-test the instructions with non-experts to make sure they are clear; revise as necessary.
Explain that it is important for subjects to pay attention to the video on each trial.
There are no “correct” ratings. The instructions should not suggest that there is a correct rating or provide any feedback as to the “correctness” of any response. The instructions should emphasize that the test is being conducted to learn viewers’ judgments of the quality of the samples, and that it is the subject’s opinion that determines the appropriate rating.
Paying subjects helps keep them motivated. Gifts (like cinema tickets or gift value cards) are a choice as well. Comment: some labs may consider gifts as a better choice than cash if they can buy them in wholesale so they can pay lower price for each (like in “group-on”-like services); for other labs, providing gifts may be difficult for legal reasons.
Subjects should be instructed to watch the entire 10-second sequence before voting. The screen should say when to vote (e.g., “vote now”). In addition, the vote number is also needed to make sure the score is marked in the correct scale of answering sheet.
4.8. Failure to Follow the Test as Instructed
If it is suspected that a subject is not responding to the video stimuli or is responding in a manner contrary to the instructions, their data may be discarded and a replacement subject can be tested. The experimenter will report the number of subjects’ data-sets discarded and the criteria for doing so. Example criteria for discarding subjective data sets are:
The same rating is used for all or most of the PVSs.
The subject’s ratings correlate poorly with the average ratings from the other subjects (see Annex II).
Different subjective experiments will be conducted by several test laboratories. Exactly 24 valid viewers per experiment will be used for data analysis. A valid viewer means a viewer whose ratings are accepted after post-experiment results screening. Post-experiment results screening is necessary to discard viewers who are suspected to have voted randomly. The rejection criteria verify the level of consistency of the scores of one viewer according to the mean score of all observers over the entire experiment. The method for post-experiment results screening is described in Annex VI. Only scores from valid viewers will be reported .
The following procedure is suggested to obtain ratings for 24 valid observers:
Conduct the experiment with 24 viewers
Apply post-experiment screening to eventually discard viewers who are suspected to have voted randomly (see Annex I).
If n viewers are rejected, run n additional subjects.
Go back to step 2 and step 3 until valid results for 24 viewers are obtained.
The same number of subjects across labs is important for post-experiment statistical analysis.
For each subjective test, a randomization process will be used to generate orders of presentation (playlists) of video sequences. Each subjective test must use a minimum of two randomized viewer orderings. Subjects must be evenly distributed among these randomizations. Randomization refers to a random permutation of the set of PVSs used in that test.
Note: The purpose of randomization is to average out order effects, ie, contrast effects and other influences of one specific sample being played following another specific samples. Thus, shifting does not produce a new random order , e.g.:
Subject1 = [PVS4 PVS2 PVS1 PVS3]
Subject2 = [PVS2 PVS1 PVS3 PVS4]
Subject3 = [PVS1 PVS3 PVS4 PVS2]
If a random number generator is used (as stated in section 4.1.1), it is necessary to use a different starting seed for different tests.
An example script in Matlab that creates playlists (i.e., randomized orders of presentation) is given below:
rand('state',sum(100*clock)); % generates a random starting seed
Npvs=200; % number of PVSs in the test
Nsubj=24; % number of subjects in the test
4.10. Subjective Data File Format
Subjective data should NOT be submitted in archival form (i.e., every piece of data possible in one file). The working file should be a spreadsheet listing only the following necessary information:
Source ID Number
HRC ID Number
Each Viewer’s Rating in a separate column (Viewer ID identified in header row)
All other information should be in a separate file that can later be merged for archiving (if desired). This second file should have all the other "nice to know" information indexed to the subjectIDs: date, demographics of subject, eye exam results, etc. A third file, possibly also indexed to lab or subject, should have ACCURATE information about the design of the HRCs and possible something about the SRCs.
An example table is shown below (where HRC “0” is the original video sequence).
5. Source Video Sequences
This section addresses purchased source sequences, requirements for camera and SRC quality, content, scene cuts, scene duration as well as source scene selection criteria.
5.1. Purchased Source Sequences
Datasets that will not be made public may use source video that must be purchased (i.e., source video sequences that proponents must purchase prior to receiving that subjective dataset). Because the appropriateness of purchased source may depend upon the price of those sequences, the total cost must be openly discussed before the ILG chooses to use purchased source sequences (e.g., VQEG reflector, audio conference); and the seller must be identified.
3D video database prepared for Atlanta VQEGmeeting by David Juszka:
Please check out 3Dvideodatabase at: http://sp.cs.tut.fi/mobile3dtv/stereo-video/
From David Juszka:
Please check out this 3D video database: http://3dmovies.page.tl/3D-Videos.htm
5.2. Requirements for Camera and SRC Quality
The source video can only be used in the testing if an expert in the field considers the quality to be good or excellent on an ACR-scale. The source video should have no visible coding artifacts.
At least ½ of the SRC in each experiment must have been shot originally at that experiment’s target resolution (e.g., not de-interlaced, not enlarged).
The ILG will view the scene pools from all proponents and confirm that all source video sequence have sufficient quality. The ILG will also ensure that there is a sufficient range of source material and that individual SRCs are not over-used. After the approval of the ILG, all scenes will be considered final. No scene may be discarded or replaced after this point for any technical reason.
For each SRC, the camera used should be identified. The camera specification should include at least the fps setting, the sensor array dimension, and the recording format and bit-rate.
All sequences shall be aligned for the right and the left view to allow stress-free viewing. This includes temporal, spatial and color registration. The minimum and the maximum disparity in pixels needs to be specified. In addition, the minimum and the maximum disparity during normal display, thus excluding “pop-out” effects, should be specified.
The source sequences will be representative of a range of content and applications. The list below identifies the types of test material that form the basis for selection of sequences.
1) movies, movie trailers
3) music video
6) broadcasting news (business and current events)
7) home video
8) general TV material (e.g., documentary, sitcom, serial television shows)
5.4. Scene Cuts
Scene cuts shall occur at a frequency that is typical for each content category.
Source video sequences selected for each test should adhere to the following criteria:
All source must have the same frame rates (25fps or 30fps).
Either all source must be interlaced; or all source must be progressive.
At least one scene must be very difficult to code.
At least one scene must be very easy to code.
At least one scene must contain high spatial detail.
At least one scene must contain high motion and/or rapid scene cuts (e.g., an object or the background moves 50+ pixels from one frame to the next).
If possible, one scene should have multiple objects moving in a random, unpredictable manner.
At least one scene must be very colorful.
If possible, one scene should contain some animation or animation overlay (e.g., cartoon, scrolling text).
If possible, at least one scene should contain low contrast (e.g., soft or blurred edges).
If possible, at least one scene should contain high contrast (e.g., hard or clearly focused edges, such as the SMPTE birches scene).
If possible, at least one scene should contain low brightness (e.g., dim lighting, mostly dark).
If possible, at least one scene should contain high brightness (e.g., predominantly white or nearly white).
At least one scene shoud present a deep (phisically) scene with a narrow depth of field focused on one object. Example: ‘Videoblog’ from Avatar. Focusing sight on the backround rather than the object the director had in mind (the actor) caused an unpleasent sensation
I think the scene should contain kinds of depth structures. Such as the depth of the ROI is near, middle, far, and it moves in same or different depth planes.
What is more, I think that rapid motion of objects between depth planes should be considered for eg. there are a lot of such scenes in Step up 3D movie where fast moving dancers were popping in front of the audience.
6. Video Format and Naming Conventions
This section addresses storage of video material, video file format as well as naming conventions.
6.1. Storage of Video Material
Video material will be stored, rather than being presented from a live broadcast. The most practical storage medium at the time of this Test Plan is a computer hard disk. Hard disk drives will be used as the main storage medium for distribution of video sequences among labs.
6.2. Video File Format
All SRC and PVSs will be stored in uncompressed AVI files in UYVY color space in 8-bit.
6.3. Naming Conventions
All Source video sequences should be numbered (e.g., SRC 1, SRC 2). All HRCs should be numbered, and the original video sequence must be number “0” (e.g., SRC 1 / HRC 0 is the original video sequence #1). All files must be named: