ITU-R policy on IPR is described in the Common Patent Policy for ITU-T/ITU-R/ISO/IEC referenced in Annex 1 of Resolution ITU-R 1. Forms to be used for the submission of patent statements and licensing declarations by patent holders are available from http://www.itu.int/ITU-R/go/patents/en where the Guidelines for Implementation of the Common Patent Policy for ITU‑T/ITU‑R/ISO/IEC and the ITU-R patent information database can also be found.
Note: This ITU-R Report was approved in English by the Study Group under the procedure detailed
in Resolution ITU-R 1.
ã ITU 2012
All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without written permission of ITU.
REPORT ITU-R BT.2160-2
Features of three-dimensional television video systems for broadcasting (2009-2010-2011)
TABLE OF CONTENTS
1 Motivations for the introduction of 3DTV broadcasting 5
2 Background to possible 3DTV systems 6
3 A hierarchical structure 7
3.1 Technology generations 8
3.2 Compatibility levels 8
3.3 Matrix points 9
4 First-generation 3DTV 10
5 Future generations of 3DTV 11
6 Expected bandwidth requirements for a first-generation system 12
7 The 3DTV broadcasting chain 12
7.1 Image source methods 13
7.2 Characteristics of signals in the studio 13
7.3 Programme production 14
7.4 Emission 14
7.5 Display 14
8 Production grammar 15
9 The viewing environment 15
10 Principles for comfortable viewing of stereoscopic three-dimensional images 16
10.1 Composite factors in perception of stereoscopic 3D images 16
10.2 Measures to enable comfortable viewing of stereoscopic three-dimensional images 16
11 Psychophysical aspects of viewing stereoscopic images 17
11.1 Psychophysical aspects 17
11.2 Examples of safety guidelines 19
12 Assessment methodology 20
13 User requirements 20
14 Performance requirements 20
15 Organizations with initiatives in 3DTV 21
16 Conclusions 21
Annex 1 – Organizations with current initiatives in 3DTV 22
1 ISO/IEC JTC1/SC29/WG11 22
2 ITU-T Study Group 9 22
3 ITU-T Study Group 16 22
4 3DTV – Network of Excellence 22
5 3D4You – Content generation and delivery for 3D television 22
6 SMPTE 22
7 The Digital Video Broadcasting Project 25
8 The Blu-ray disc Association (BDA) 25
9 HDMI Licensing, LLC, has announced the release of HDMI specification 1.4 25
10 Consumer Electronics Association 25
11 The 3D@Home Consortium 25
12 Association of Radio Industries and Businesses 26
13 Ultra-realistic communications forum 26
14 3D Consortium 26
15 Consortium of 3-D image business promotion 26
16 Japanese Ergonomics National Committee 26
17 Telecommunications Technology Association 26
18 European Broadcasting Union (EBU) 27
19 MUSCADE 27
20 3D VIVANT 27
Annex 2 – Historical background on the development of stereoscopic and 3D television systems 27
Annex 3 – Introduction to free viewpoint television 28
Annex 4 – Psychophysical studies on three dimensional television systems 28
1 Key items for psychophysical studies 28
1.1 Naturalness and unnaturalness of images 28
1.2 Viewing comfort and discomfort 29
1.3 Visual fatigue caused by parallax 3DTV viewing 29
1.4 Individual differences in the stereopsis function 29
1.5 Effect on young people 30
2 Naturalness and unnaturalness of stereoscopic images − Geometrical analysis of spaces reproduced by stereoscopic images 30
2.1 Theoretical analysis of reproduced spaces 30
2.2 Size distortion 33
2.3 Depth distortion 37
3 Viewing comfort and discomfort of stereoscopic images 40
3.1 Parallax distribution and visual comfort of stereoscopic images 40
3.2 Visual comfort and discomfort in viewing stereoscopic images 44
4 Visual fatigue in viewing stereoscopic images 46
4.1 Experimental results on inconsistency between vergence and accommodation 46
4.2 Experimental results on parallax amount and lateral/depth motion 47
5 Spatial distortion prediction system for 3DTV 49
5.1 Introduction 49
5.2 Spatial distortion in 3DTV 49
5.3 Spatial distortion prediction system for 3DTV 49
Annex 5 – 3DTV Broadcasting Safety Guideline in Korea 53
1 Necessity of Safety Guideline 53
1.1 Necessity 53
1.2 Typical discomforts 53
2 Viewing circumstances guideline 53
2.1 Viewing time and rest time 53
2.2 Viewing distance 53
2.3 Viewing position 53
2.3.1 Viewing position 53
2.3.2 Horizontal viewing position 54
2.4 Others 54
3 Viewer guideline 55
3.1 Symptoms caused by 3D viewing on viewers 55
3.2 Stereo blindness and stereo abnormality 55
3.3 Chronic diseases 55
3.4 Age 55
4 Content guideline 56
4.1 Setting stereo cameras 56
4.2 Taking stereoscopic images 57
4.3 Caption 58
4.4 Screen disparity 58
5 Display guideline 58
5.1 Crosstalk of display 58
5.2 Display refresh rate 59
5.3 3D glasses 59
Annex 6 – Italian Health Ministry Circular Letters 60
Annex 7 – Example notifications given to viewers in Japan 61
1 Notifications when 3D programmes are broadcast on same channel as 2D programmes 61
2 Notifications that should be broadcast with 3DTV programmes 61
3 Notifications may inform viewers, even though some of these are basically in the product manual 61
Annex 8 62
Annex 9 75
The technology needed for a first-generation three-dimensional television (3DTV) two-channel stereoscopic system already exists, although so far there have been no announced plans for the general introduction of regular free-to-air broadcasting services. A number of broadcasting organizations nevertheless continue to carry out experiments in stereoscopic 3DTV production, while pay-television operator BSkyB introduced a stereoscopic 3DTV channel in the United Kingdom in October 2010. Several consumer electronics manufacturers introduced stereoscopic television receivers during 2010.
An essential aim of this Report is to present a framework for a study of the various aspects of digital three‑dimensional (3D) TV broadcasting systems1 as outlined in Question ITU‑R 128/6. It is intended to identify the issues that need to be addressed, and to encourage further contributions to WP 6C.
1 Motivations for the introduction of 3DTV broadcasting
Interest in the possibility of 3DTV in the home may be due in part to a new wave of 3D movies reaching the cinema. In spite of the need to wear glasses, 3D movies have proved to be popular, attracting large audiences who are prepared to pay a premium for the 3D experience.
This in turn has created expectations of the imminent arrival of 3D movies in the home through packaged media2, such as DVD and Blu-ray. Movies are an important part of television broadcasting, and so it is natural to consider whether 3D movies might in due course be made available through broadcast means.
On the other hand, while the need to wear glasses has not been an impediment to the success of 3D‑cinema, questions are raised about the suitability of glasses in the home environment. The current state of development of autostereoscopic displays for glasses-free viewing leaves much to be desired, although it is hoped that ongoing research will eventually lead to improved or even new forms of glasses-free display.
So today’s motivation to explore the possibility of the introduction of 3DTV broadcasting may be seen partly as exploitation of the natural evolution of the phased delivery chain used for movies where feature films are first screened in the theatre, then go to the home in packaged media, and finally are made available on broadcast television. In addition, a pay television operator may also have an interest in offering premium content in 3D, whether movies or live events.
Lastly, although 3DTV might not currently be seen a “future alternative” or development of high-definition television (HDTV), it is certainly possible that it could at least have a complementary role to other forms of 3D experience that are likely to become available in the home in the not too distant future.
2 Background to possible 3DTV systems
The fundamental means by which a 3DTV broadcast system today is capable of enhancing the user’s visual experience of three-dimensionality, compared to the broadcast of HDTV images, is by delivering stereoscopic image information to viewers in the home. 3DTV broadcasts must provide the signals necessary for generating images with different views of a scene to the two eyes of a viewer. By means of binocular fusion of the stereoscopic images, the 3DTV viewer can obtain an enhanced sensation of depth and an improved sensation of “presence” and “reality”.
It is envisioned that the technology of 3DTV systems, as with all media systems, will develop and advance from one generation to the next, over a period of possibly many years. It may be anticipated that future generations will be likely to increase the amount of visual information provided, reduce the restrictive need for eyewear, and increase the freedom of movement allowed without negatively affecting the quality of the stereoscopic depth.
Thus, one method of classifying the various 3DTV systems is as follows:
Those systems that are based on or targeted for “plano-stereoscopic” displays, whereby left and right eye images are presented independently to the two eyes using various methods that require eyewear to isolate the two views of a given scene.
Multiview autostereoscopic systems:
Such systems that are targeted for “plano-stereoscopic” (or non-volumetric) displays whereby left and right eye images are presented independently to the two eyes, using various methods that allow two views of a given scene to be isolated without the need for eyewear. In addition, this generation of systems may provide multiple views of a scene such that viewers can freely change their viewing angle and have access to visual scene behind objects.
Integral imaging or holographic system:
Those systems that are based on object-wave recording (holography) or integral imaging and are targeted at the simulation of a light field generated by an actual scene. Thus, freedom of viewing position without the hindrance of eyewear is provided. In addition, the light field provides the visual information (focus cues) for adjusting the ocular lens so as to focus correctly at the same distance as the convergence distance. This provides more natural viewing than the systems of the previous generations that requires maintenance of focus at the display screen irrespective of convergence distance.
3 A hierarchical structure
Current proposals for 3DTV signal formats can be seen as forming a hierarchical structure, which correspond to different constraints and requirements. This is given in diagrammatic form in Fig. 1. This hierarchy might be used in future for any draft Recommendation for 3DTV by the ITU‑R found to be required.
Matrix of signal formats for 3DTV
The principle of the hierarchy is that each box in the matrix in the diagram defines a type of signal, and this would correspond to the needs of a generic type of receiver. This is somewhat similar to the concept used for ISO/IEC JTC1 MPEG standards, though there are differences. Upper levels are intended to be “backward compatible” with lower levels, with one exception which is explained later.
Though different 3DTV display technologies today have different advantages and disadvantages, the hierarchy is essentially independent of the type of display used. Research and development, and market forces should allow 3DTV displays and technology to evolve and improve, while preserving the public interest for interoperability.
The hierarchy needs to cope with a range of circumstances, from where existing receiving equipment must be used intact (though glasses are used), to where some new elements (displays) are acceptable, to where both new receivers and displays are acceptable.
The quality of the 3DTV will be influenced by the quality of the individual left-eye and right-eye signals, and because of this, 3DTV may be most effective for the higher quality environment rather than the SD-TV environment.
Broadcasters may choose to use available 3DTV technology, and find the limitations acceptable, bearing in mind the gains, or they may prefer to wait for future technology which will have fewer limitations. It seems desirable that ITU‑R should provide guidance for both.
3.1 Technology generations
In Fig. 1, the x-axis relates to the system “generation”. We may expect basic 3DTV technology to evolve in the decades ahead. The pattern of evolution is that we move from viewing a single stereo view with glasses, then to viewing with greater freedom for head movement, finally to viewing as we do normally (“natural vision”).
Broadcasters may decide to begin broadcasting with earlier technology generations (with its limitations), or to wait for future generations.
First-generation technology is based on the capture and delivery of two views, one for the left eye, and one for the right eye. There is a single “binocular disparity” or binocular parallax. There are limitations with such systems, compared to “natural vision”. With careful production, delivery, and display, effective results can be achieved. Usually, special glasses are used for viewing, though viewing without glasses (auto-stereoscopic) viewing is also possible.
Second generation technology is based on capture and delivery of multiple views. This allows multiple binocular disparities which makes the viewing experience closer to “natural vision”. Normally viewing will be done without glasses on auto-stereoscopic displays.
Third generation technology is based on the capture and delivery of the “object wave”, as is done in a simple way today with holography. The development of such systems is many years away at the moment.
We cannot predict with certainty whether, when, or if, the higher generations will be developed. But, we may note that often generation steps occur about every ten years or so, and that there can be a long lead time from idea to commercial exploitation.
3.2 Compatibility levels
The levels, or y-axis, in Fig. 1 relates to compatibility levels.
Level 1 relates to signals which provide for a system which does not require any new equipment by the viewer with the exception of glasses. This level is said to be HD conventional display compatible (CDC).
Level 2 relates to signals which provide for systems that require a new display but not a new set top box. This level is said to be conventional HD frame compatible (CFC). The 3DTV signal appears as a single HD signal to the set top box, which passes it through to the (new) display, where it is decoded and displayed as left and right pictures. If a 2D service of the same channel or programme is needed, it can in principle be provided as a conventional HD signal simulcast, provided there is sufficient spectrum. The left-eye and right-eye signals do not have the same “spectral occupancy” as conventional HD signals – some has to be sacrificed.
Levels 3 and 4 relate to systems which require a new set top box and a new display, but which offer a normal HD spectral occupancy left-eye and right-eye 3D service. Level 3 is said to be frame-compatible compatible (FCC), because it is an extension of Level 2. Level 4 is said to be conventional HD service compatible (CSC) because an existing 2D set top box will find, in the incoming multiplex, a conventional 2D HD signal which it can pass to a conventional display as a 2D picture.
3.3 Matrix points
Level 1/first-generation profile
The generic receiver here, for which the signal is intended, is a conventional HDTV receiver. The signals transmitted are based on a wavelength division multiplex and matrixing of the left-eye and right-eye signals, and a choice of complementary primary colour separation. For example, the “ColorCode” system has been broadcast in Europe and North America using Red/Green in one eye, and Blue for the other eye. Other sets trialled have been Red vs. Green/Blue, or Green vs. Red/Blue. The exact matrixing and choice of complementary colours can be left to market developments because a conventional receiver and 3-primary display is used, though in the light of experience ITU‑R may be able to report on options.
Level 2/first-generation profile
The generic set top box here for which the signal is intended is a conventional HDTV set top box. But the 3D display needed is new and must have the capability to interpret an HD frame as left-eye and right-eye pictures. There are alternative ways to arrange the left-eye and right-eye signals to appear to the STB to be a single frame. The three principal methods (which involve sub-sampling) are the side by side (SbS), the over and under method (OaU), and the interleaved sample (IS), checkerboard or Quincunx method (of which there are variants). BSkyB in the United Kingdom uses the SbS method (2x1080i/960). It would be very valuable to the public to identify a single CFC method for broadcasting. At minimum, a common method of signalling the format is needed, such as has been developed by DVB.
This matrix point may be of particular value to broadcasters who manage a large existing population of set top boxes which must not be disenfranchised by the 3DTV broadcasts, and for whom additional delivery channels are available which can be used for 3DTV.
Simultaneous delivery of a 2D version of the same programme, if needed, requires a simulcast of a conventional HDTV signal.
For this and other Levels, the issue of “creative compatibility” of a 3DTV signal and a 2D TV version needs to be considered.
Level 3/first-generation profile
The generic set top box (or IRD) for which the signal is intended here is a new set top box which is able to decode a Level 2, Frame Compatible image, and also decode a resolution enhancement layer, using for example, MPEG SVC (scalable video coding), yielding normal spectral occupancy L and R HD images for output to the display. This approach would allow existing Level 2 transmissions to be compatibly improved to normal HD spectral occupancy, with the improvement becoming available by replacing the population of conventional set top boxes with the new set top boxes that include a Level 5 H.264 decoder; interlace content would not be supported. Note that unless all set top boxes are replaced, it could still be necessary to simulcast a 2D version of the programme for the 2D audience. Set top boxes (or IRDs) for this level would decode Levels 1 and 2 also.
Level 4/first-generation profile
The generic set top box here is also a new set top box (or IRD) which is able to decode an MPEG MVC signal conforming to the ISO/IEC JTC1 MPEG specification. The signal is arranged so that a conventional set top sees a single 2D HD signal which can be passed to a conventional display as a 2D service. New set top boxes (or integrated receiver/displays) recognize the additional information in order to decode a second view and provide two output signals L and R, to the display. Set top boxes for Level 4/first-generation profile include capability for Level 2 decoding (but, depending on market conditions, not complete Level 3 decoding including extension).
This matrix point may be particularly valuable to operators of terrestrial broadcasting services, where channels are scarce, and where it is necessary to provide both a 3D and 2D service from the same channel.
Level 4/second-generation profile
The generic receiver here for which the signal is intended is also a new set top box which is able to decode the 2D HD plus depth format as specified by the IEC/ISO JTC1 MPEG specification. The display is normally a multiview auto- stereoscopic display. Such set top boxes would also decode Levels 1, 2, and 4 of the first-generation profile.
Other matrix points are left empty for the time being.
4 First-generation 3DTV
It is not currently envisaged that a complete transition from 2D to 3DTV broadcasting will take place in the foreseeable future.
Rather, there is a need to first properly assess the viability of first-generation 3DTV broadcasting. This might perhaps take the form of various 3DTV programme content being made available to the public in a limited ad hoc manner, perhaps just a few hours per week. This could align with other research that is required, such as on the possible effects of eye strain and to assess whether there is acceptance of prolonged stereoscopic viewing. This may be considered as a test phase.
The business models are not the same for pay television and for free-to-air broadcasters, and so the acceptable solutions for first-generation 3DTV broadcasting are anticipated to be different, as explained in § 3 above.
Two variants of first-generation systems may therefore be required for use in different situations: where a service is to be delivered only to viewers with 3D displays; secondly, where the primary audience continues to be viewers receiving an existing 2D service, and it is wished to make use of the same transmission channel to deliver at least some programmes in 3D.
Two techniques are available to satisfy the above conditions:
1 A “frame-based” approach: package the left and right images into an existing HDTV frame.
There are several possible permutations of placement of the left-eye and right-eye images within the frame:
– line/column interleaved;
There is also the potential to add layering techniques to restore the resolution that would otherwise be lost by the placement of two images within a single frame.
A frame-based service would not be directly viewable by existing 2D viewers.
For a multichannel pay television operator, the priority is likely to be to exploit the existing infrastructure in order to deliver 3DTV content to a group of subscribers. Indeed, such an operator could be in a position to do so without impact on services already being delivered to viewers. In this situation, a frame-based solution may be attractive.
A free-to-air operator with access to only limited transmission capacity might require to continue to use existing transmission channels to reach the general 2D audience. In this situation, a frame-based approach would not be suitable.
This approach requires that additional information be conveyed in order to reconstruct the second image for suitably equipped 3D receivers.
There are several possibilities for making available the additional information needed to reconstruct the second image:
– 2D + “delta” (data coded to represent the difference information between left-eye and right-eye images);
– 2D + DOT (data to represent depth, occlusion and transparency information).
A “2D + depth” coding scheme could allow multiple views to be generated for presentation on autostereoscopic displays.
The 2D compatible approach allows existing viewers to continue to watch a 2D service. Those viewers wishing to receive 3DTV transmissions would need specially equipped receivers.
As an example, Korea’s terrestrial broadcasters, KBS, MBC (Munhwa Broadcasting Corp.), SBS and EBS (Educational Broadcasting System), have prepared for 3D trial broadcasting from October 2010 using dual stream coding (left image with MPEG-2, right image with AVC/H.264) at a resolution of 1 920 × 1 080 interlaced 30 fps. Unlike some countries that have already tested 3D TV broadcasting, they will offer the service through terrestrial networks. Furthermore, Korea will be the first country in the world to offer a full HD 3D broadcasting service. In addition, cable broadcasters CJHelloVision and HCN and Korea Digital Satellite Broadcasting, will also take part in the 3D trial broadcasting service.
5 Future generations of 3DTV
Advanced forms of autostereoscopic display in conjunction with multiple camera systems are under study with the intention of allowing viewers to set their preferred viewpoint and to change it continuously in a range determined by the number of cameras and their allocation, for example so‑called “free viewpoint television”, see Annex 3. This approach can retain backwards compatibility with the displays used for first-generation 3DTV.
There are also studies on possible new forms of “object wave recording” that could allow three‑dimensional television images to be presented in a way that represents viewing the physical light in a virtual a space, perhaps using an advanced “integral” method or a holographic system. Such schemes are in the research phase. These studies are to be encouraged, as they promise to lead to the eventual realization of the ultimate goal of presenting images to viewers that are virtually indistinguishable from natural real-world surroundings. To achieve this, new types of advanced volumetric display will be required. It is currently uncertain when this technology might become available: it is likely to be many years in the future.
6 Expected bandwidth requirements for a first-generation system
In the case of a first-generation “2D compatible” system of broadcasting, some additional bit rate will certainly be required. In the extreme, 100% extra would be required for a second simulcast video channel. In practice this would be likely to be somewhat lower using a supplementary data stream for reconstruction of the second video image.
In the case of a “frame compatible” system, if it is accepted that the L and R images contain less spatial resolution than for a 2D system, then in principle no extra bit rate is required compared to the transmission of a normal 2D service. In practice, it is understood that operators plan to use broadcast bit-rates which are at the high end of current practice. It may nevertheless be argued that because this approach does not provide a 2D-compatible service, a completely new transmission channel is required, i.e. 100% extra capacity. However, in the circumstances of a multichannel operator this might not necessarily be a constraint.
It is currently unclear what the quality differences would be between these approaches, and there will inevitably be a trade-off between bit-rate and quality. Independent testing would be desirable.
The human visual perception can be exploited to reduce bandwidth requirements. For example:
– filtering (blurring) in one eye (switching on scene cuts);
– asymmetrical coding.
The 2D + depth approach offers the prospect of considerable bit-rate saving. However, a cost-effective method for depth map creation is not easy to obtain and is still an active area of research.
In the case of more advanced multiview schemes, multiview coding requires multiple synchronized video signals to show the same scene from different viewpoints. This leads to large amounts of data, but typically a larger amount of inter-view statistical dependencies than for stereo.
Last, but not least, independent testing using a standardized testing methodology is needed in order to accurately quantify how much extra bit rate would be needed, using a range of representative 3DTV source material.