Proposal of draft new Recommendation on methods for the subjective assessment of stereoscopic three-dimensional television (3DTV) systems [J.3DTV-sma].
This Recommendation provides methodologies for the assessment of stereoscopic television systems including general test methods, the grading scales and the viewing conditions.
Table of Contents
1 Scope 2
1.1 Applications 2
1.2 Limitations 2
2 References 2
Normative References 2
Informative References 3
3 Definitions 3
3.1 Terms defined elsewhere: 3
3.2 Terms defined in this Recommendation 3
4 Abbreviations and acronyms 4
5 Conventions 4
6 Selection of 3D Source Content 4
6.1 Visual comfort 5
6.2 Spatial and temporal information 5
[Editor’s note: Further studies required on threshold values. 6.4 Discrepancies between left and right images 6
7 Test Methods and Experimental Design 6
7.1 Absolute Category Rating (ACR) Method 7
7.2 Degradation Category Rating (DCR) Method 7
7.3 Comparison Category Rating (CCR) Method 8
7.5 Acceptable Changes to the Methods 9
7.5.1 Changes to Level Labels 9
7.5.2 ACR with Hiden Reference (ACR-HR) 10
7.6 Unacceptable Changes to the Methods 10
7.6.1 Do Not Increase the Number of Levels 11
8 Environment 11
8.1 Maximum display crosstalk 11
8.2 Screen Brightness 11
8.3 Viewing distance and angle 11
8.4 Viewing conditions 11
8.5 Color temperature of 3D displays 11
9 Subjects 11
10 Experiment design 12
10.1 Inclusion of reference conditions within the experiment 12
11 Experiment implementation 12
11.1 Informed consent 12
11.2 Viewer screening 12
11.2.1 Eye vision test 12
11.2.2 Color blindness test 12
11.2.3 Stereoscopic acuity test 13
11.2.1 Inter-pupliary distance 13
11.3 Instructions and training 13
11.4 Voting sessions 13
11.5 Questionnaire or interview 13
12 Data analysis 13
Table of Contents 19
1. Introduction 21
2. General viewing conditions 22
3 Test material 23
3.1 Visual comfort limits 23
4 Experimental apparatus 24
5 Observers 24
5.1 Sample size 24
5.2 Screening 24
6 Instruction to observers 24
7 Session duration 24
8 Subjective methodologies 25
9 Use of reference test sequences 25
10 Statistical analysis and viewers’ rejection criteria 25
1Points of investigation for developing P.3D-sam 26
Stereoscopic three-dimensional television exploits the characteristics of the human binocular visual system by recreating the conditions that bring about the perception of the relative depth of objects in the visual scene. The main requirement of current stereoscopic imaging is the capture of at least two views of the same scene from two horizontally aligned cameras. The images of the objects depicted in the scene will have different relative positions in the left- and right-view. This difference in relative positions in the two views is typically called parallax or disparity, and is usually expressed in pixels or physical distances (a percentage of screen width). It should be noted that parallax should not be confused with angular (retinal) disparity. In fact, the same parallax information would produce different angular (retinal) disparities with different viewing distances. The magnitude and direction of the perception of depth is based on the magnitude and direction of the retinal disparities elicited by the stereoscopic image.
Assessment factors generally applied to monoscopic television pictures, such as resolution, colour rendition, motion portrayal, overall quality, sharpness, etc. could be applied to stereoscopic television systems. In addition, there would be many factors peculiar to stereoscopic television systems. These might include factors such as depth resolution, which is the spatial resolution in depth direction, depth motion, that is, whether motion or movements along depth direction is reproduced smoothly and spatial distortions. Two well-known examples of the latter are the “puppet theatre effect, i.e. when objects areperceived as unnaturally large or small, and the cardboard effect, i.e. when objects are perceived stereoscopically but they appear unnaturally thin.
We can identify three basic perceptual dimensions which collectively affect the quality of experience provided by a stereoscopic system: picture quality, depth quality, and visual comfort. Some researchers have argued that the psychological impact of stereoscopic imaging technologies could be better measured in terms of more general concepts such as naturalness and sense of presence.
Picture quality refers the perceived quality of the picture provided by the system. This is a main determinant of the performance of any video system. Picture quality is mainly affected by technical parameters and errors introduced by, for example, encoding and/or transmission processes.
Depth quality refers to the ability of the system to deliver an enhanced sensation of depth. The presence of monocular cues, such as linear perspective, blur, gradients, etc., convey some sensation of depth even in standard 2D images. However, stereoscopic 3D images contain also disparity information which provides additional depth information and thus an enhanced sense of depth as compared to 2D.
Visual (dis)comfort refers to the subjective sensation of (dis)comfort that can be associated with the viewing of stereoscopic images. Improperly captured or improperly displayed stereoscopic images could be a serious source of discomfort.
Naturalness refers to the perception of the stereoscopic image as being a truthful representation of reality (i.e. perceptual realism). The stereoscopic image may present different types of distortion which make it less natural. For example, stereoscopic objects are sometimes perceived as unnaturally large or small (puppet theatre effect), or they appear unnaturally thin (cardboard effect).
Sense of presence refers to “the subjective experience of being in one place or environment even when one is situated in another” .
This Recommendation presents information regarding methods and procedures for the assessment of the three primary dimensions: picture quality, depth quality, and visual comfort, outlined above. Methodologies for the assessment of naturalness and sense of presence will be added at a later stage.
2. General viewing conditions
The viewing conditions (including screen luminance, contrast, background illumination, viewing distance, etc.) should generally match those used for 2D as described in the proposed preliminary draft new Recommendation ITU‑R BT.[GVC] “General viewing conditions for subjective assessment of quality of television pictures”, which is discussed in ITU-R WP6C. This Recommendation specifies two possible criteria for the selection of the viewing distance. The Design Viewing Distance (DVD) is to be selected. The DVD for a digital system is the distance at which two adjacent pixels subtend an angle of 1 arc-min at the viewer’s eye; and the horizontal design viewing angle as the angle under which an image is seen at its optimal viewing distance.
For example, when expressed in multiples of the picture’s height, the DVD for the 1280 x 720 image resolution system is 4.8H; and that for the 1920 x 1080 family HDTV image resolution system is 3.1H (static images).
For illustrative purposes, Table I reports the design viewing distance in metres for a representative sample of TV set diagonal sizes.
TABLE I - Design viewing distance in meters for various TV set diagonal sizes
Diagonal size (inches)
1920 ´ 1080 image system
1280 ´ 720 image system
Design viewing distance (metres)
Design viewing distance (metres)
It should be noted since two adjacent pixels subtend an angle of 1 arc-min at the viewer’s eye, then at design viewing distance the smallest angular (retinal) disparity that can be represented by the system (i.e. depth resolution) is equal to 1 arc-min (or equivalently 60 arc-sec). This value is about twice the human disparity threshold, which is about 30 arc-sec. Therefore, most viewers should have no difficulty resolving the smallest disparity represented by the 3D system. (This is true for all systems in Table I when presented at the design viewing distance).
3 Test material
The selection of the test material should be motivated by the experimental question addressed in the study: e.g. the content of the test sequences (sport, drama, film, etc.) and their spatiotemporal characteristics should be representative of the programmes delivered by the service under study.
In addition, the selected stereoscopic test sequences content should also be normally comfortable to watch. The visual comfort of stereoscopic images depends critically upon the disparity contained in the image. Accordingly, care should be taken to ensure that the disparity does not exceed the limits outlined in the following section, unless the study is specifically aimed at measuring visual comfort. Moreover, whenever possible the statistics: mean, standard deviation, and range (min/max), of the disparity (screen parallax in pixel) distribution of the test sequences should be measured and reported.
Excessive disparity/parallax causes visual discomfort possibly because it worsens the conflict between accommodation and vergence. Therefore, it has been suggested that to minimize the accommodation-vergence conflict, the disparities in the stereoscopic image should be small enough so that the perceived depths of objects fall within a “comfort zone”. Several limits have been proposed. One approach uses a measure of the screen parallax, expressed as a percentage of the horizontal screen size, to specify the limits of comfortable viewing. Values of 1% for crossed/negative disparities and 2% for uncrossed/positive disparities (for a total value of about 3%) have been suggested. According to another approach, the comfort zone is delimited by the depth of field of the eye. For the viewing conditions typical of television broadcast, researchers have assumed a depth of field between ±0.2D (diopters) and ±0.3D (diopters). For a 1920×1080 HDTV image resolution system watched from the design viewing distance of 3.1H, these values correspond approximately to ±2% and ±3% of screen parallax.
Recall that at the design viewing distance two adjacent pixels subtend an angle of 1 arc-min at the viewer’s eye. Thus, 60 pixels correspond to 1 degree of visual angle. This allows us to easily specify the comfort limits in terms of retinal disparity (for an average viewer). For example, for 1920×1080 HDTV image resolution systems, 1% (~19.2 pixels) corresponds approximately to 20 arc-min, 2% to ~40 arc-min and 3% to ~60 arc-min.
It should be noted that even though at the design viewing distance two adjacent pixels always subtend an angle of 1 arc-min, the physical separation (e.g. in mm) between those pixels increases with larger displays (the number of pixels remains the same, but the physical size of the screen increases). Therefore, the higher limits (e.g. ±3%) could result in larger displays in a physical distance between corresponding points (i.e. the parallax of the two views in mm) that exceed the interpupillary distance of the average viewer (~63-65 mm). This could result in increasing discomfort.
In general, since studies using stereoscopic test sequences could elicit some degree of visual discomfort, it is recommended to use, whenever possible, test material whose disparity does not exceed the lower limits, albeit occasional excursions above these limits might be allowed.
4 Experimental apparatus
The experimental apparatus (video server, display, etc.) should be capable of displaying full resolution HD test sequences, for example using an HDMI frame-packing format (unfortunately for now this is only possible for 720p target). This would allow greater flexibility in the range of studies that be carried out.
As of today, there is no reference display for 3DTV assessment. The display should exhibit very low cross-talk (ideally below human threshold) and capable of receiving a variety of input formats (without having to manually change settings).
5.1 Sample size
Sample size considerations for 3D studies are not different from those for 2D studies.
Observers should be screened for visual acuity, colour, and stereoscopic vision. The latter could be assessed using clinical tests, such as Randot, Titmus, or Frisby stereo tests. These clinical tests usually measures retinal disparities from 20 to 400 arc-sec. The test materials should be similar as for stereoscopic TV, given in ITU-R Recommendation..
6 Instruction to observers
Instruction should be tailored to dimension (e.g. depth quality, comfort, etc.) under investigation. Furthermore, ethical guidelines are more stringent than those typically used in image quality assessment since participants might experience visual discomfort. In general, these studies require more care in informing the participant of the motivations of the study as well as any possible negative resulting from exposure to the stimuli used in the study.
7 Session duration
If the viewing material is deemed comfortable, then the session duration might be as long as that used for 2D studies (i.e. ~20-40 minutes intermixed with breaks). If the material is known to contain excessive parallax, and thus known to be potentially uncomfortable, then the duration should be limited.
8 Subjective methodologies
Many of the standard methods outlined in Recommendation ITU‑R BT.500 could be used, occasionally in a slightly modified form (e.g. different scales), for the assessment of stereoscopic systems. A few methods that could be used for the assessment of picture quality, depth quality, and visual comfort are the following:
Single – stimulus (SS) methods, sequence duration 8-10 sec.
All the methods listed in Section 8 should include a “reference” sequence, whenever available, as part of the test sequences set. The “reference” is usually a version of the test sequence that has not undergone any processing (i.e. the original source sequence). For the 3DTV studies, the main “reference” is the original unprocessed stereoscopic sequence. The experimental plan might include also the monoscopic version of the “reference” (i.e. only one view of the original source sequence); for example in visual comfort studies it might be useful to use the visual comfort of the monoscopic reference as the baseline.
10 Statistical analysis and viewers’ rejection criteria
The statistical analyses and the viewers’ rejection criteria should be the same as for 2D studies.
Points of investigation for developing P.3D-sam
The points of investigation that are specific to subjective quality assessment of stereoscopic 3D video include:
Repeatability of a given test methodology:
This is a very crucial point for any subjective testing methodology. Empirical data are needed to prove that a given methodology can produce repeatable and reproducible data.
Repetition of the same experiment (same test set with same methodology) can provide such empirical evidence
Ability to separately assess the different basic perceptual attributes related to 3D quality (picture quality, viewing comfort and depth quality). An analogy can be made to audio-visual quality where cross-modal interaction between audio and video has been documented. In the same way, the question is whether subjects are able to assess independently visual quality, depth quality and visual comfort. If not, then is it relevant to ask them to judge these separate attributes? See also the point on “role of instructions”.
Necessity to use anchors (2D and 3D anchors) in the test stimuli:
The potential of 3D lies in the increased quality of experience compared to 2D. Viewers will only embrace 3D if it provides a better viewing experience than 2D. The underlying question is whether or not subjects are more able to judge 3D quality if they are asked to compare it to 2D, instead of simply judging a 3D stimulus on its own (or even in comparison to some 3D reference).
With the hypothesis that subjects know more easily how to judge a 2D video stimulus, one adaptation of the 2D methodologies could make explicit reference to a 2D version of the stimulus. Explicit comparison can be made in the stimulus presentation and/or in the rating scale
Viewing conditions (e.g., viewing angle):
Currently 3 simultaneous viewers are allowed in front of a 2D HDTV screen in a subjective test. Because of the increase of crosstalk with viewing angle (angular position), this number may need modification (e.g., is a maximum of 1 or 2 viewers a more appropriate number for 3D tests?)
What is the influence of stereoscopic display characteristics (mainly crosstalk level/characteristics) on quality judgment
Method to characterize and select a stereoscopic display for conducting subjective experiments (e.g., maximum crosstalk =< crosstalk threshold)
Short (10-sec) videos have been traditionally used in 2D video subjective testing with overall rating to avoid problems of recency effects. Literature has shown that subjects can confidently provide a judgment of image quality for this range of duration.
The underlying question is whether or not such a short video duration is suitable to assess visual comfort and depth quality. Some works, without providing empirical data but only survey, have suggested that longer duration may be needed.
Role of instructions and more elaborated practice session: These two points may need more emphasis in case of 3D than in 2D.
Most subjects are not well experienced with viewing of 3D content. Most of them have viewed maybe a few 3D movies but experience is far from comparable to exposure to 2DTV. As a consequence, subjects may not well understand how they should judge the 3 basic perceptual attributes for two reasons:
Firstly, they may not well understand the meaning of the attribute to judge.
Secondly, they may not know if they need to consider this attribute alone or not. For example, in judging visual quality, should the perception of depth (depth quality) be taken into account? Should visual comfort be taken into account?
Clear definition of depth quality and visual comfort:
Depth quality: from experience, this is usually the most difficult attribute to be judged. As viewers are not so experienced with viewing of 3D content, they usually find it difficult to know how to provide a judgment.
Visual comfort: although there is a natural sense in knowing what is and is not comfortable viewing, precise description of symptoms may be necessary.
Use of additional questionnaires (besides the quality rating):
Use of ad-hoc additional questionnaires (similar to simulator sickness questionnaire) should also be investigated to gain more understanding in how people judge 3D and react to it.
Which questions are relevant in which context? When should these questions be asked?
Republic of Korea
Tel: +82 2 21232779
Fax: +82 2 3124584
Tel: +33 2 99 27 90 45
Fax: +33 2 99 27 30.15
Tel: +1 303 497 3579
Fax: +1 303 497 5969
Attention: This is not a publication made available to the public, but an internal ITU-T Document intended only for use by the Member States of ITU, by ITU-T Sector Members and Associates, and their respective staff and collaborators in their ITU related work. It shall not be made available to, and used by, any other persons or entities without the prior written consent of ITU-T.