This section of the Recommendation is intended to be a living document. The methods and techniques described in this section cannot, by their very nature, account for the needs of every subjective experiment. It is expected that the experimenter may need to modify the test method to suit a particular experiment. Such modifications fall within the scope of this Recommendation.
The following acceptable changes have been evaluated systematically. Subjective tests that use these modifications are known to produce repeatable results.
7.5.1 Changes to Level Labels
Translating labels into a different languages does not result in a significant change to the MOS. Although the perceptual magnitude of the labels may change, the resulting MOS are not impacted.
An unlabeled scale may be used. For example, ends of the scale can be labeled with the symbols “+” and “-”.
A scale with numbers but no words may be used.
Numbers may be included or excluded at the preference of the experimenter.
Alternate wordings of the labels may be used when the rating labels do not meet the needs of the experimenter. One example is using the DCR method with the ACR labels. One example is using the ACR method with a listening-effort scale as mentioned in ITU-T Rec. P.800. An example specific to 3D, is when assessing visual fatigue and asking about focusing difficulty, to present the following five levels:
7.5.2 ACR with Hiden Reference (ACR-HR)
An acceptable variant of the ACR method is ACR with Hiden Reference (ACR-HR). With ACR-HR, the experiment includes a reference version of each video segment, not as part of a pair, but as a freestanding stimulus for rating like any other. During the data analysis the ACR scores will be subtracted from the corresponding reference scores to obtain a DMOS. This procedure is known as “hidden reference removal.”
Differential viewer scores (DV) are calculated on a per subject per processed video sequence (PVS) basis. The appropriate hidden reference (REF) is used to calculate DV using the following formula:
DV(PVS) = V(PVS) – V(REF) + 5
where V is the viewer’s ACR score. In using this formula, a DV of 5 indicates ‘Excellent’ quality and a DV of 1 indicates ‘Bad’ quality. Any DV values greater than 5 (i.e. where the processed sequence is rated better quality than its associated hidden reference sequence) will generally be considered valid. Alternatively, a 2-point crushing function may be applied to prevent these individual ACR-HR viewer scores (DV) from unduly influencing the overall mean opinion score:
crushed_DV = (7*DV)/(2+DV) when DV > 5.
ACR-HR will result in larger confidence intervals than ACR, CCR or DCR.
The ACR-HR method removes some of the influence of content from the ACR ratings, however to a lesser extent than CCR or DCR.
ACR-HR should not be used when the reference sequences are fair, poor or bad quality. The problem is that the range of DV ’excellent’ quality diminishes. For example, if the reference video quality is poor on the ACR scale, then DV must be 3 or greater.
7.6 Unacceptable Changes to the Methods
The following acceptable changes have been evaluated systematically. These modifications are not allowed.
The number of levels should not be increased. Tests into the replicability and accuracy of subjective methods indicate that the accuracy of the resulting MOS does not improve. However, the method becomes more difficult for subjects.
Experiments that compare discrete scales (e.g., 5-point, 9-point, 11-point) with continuous scales (e.g., 100-point scales) all indicate that continuous scales contain more levels than can be differentiated by people. The continuous scales are treated by the subjects as if it were a discrete scale with fewer options (e.g., using five to nine levels).
Prohibited examples include changing ACR from a discrete 5-level scale to a discrete 9-level scale, a discrete 11-level scale, or a continuous scale.
This issue is currently being investigated in J.3D-disp-req. P.3D-sam may refer to J.3D-disp-req and provide the maximum allowed display crosstalk rate.
8.2 Screen Brightness
For 3D displays that use eye glasses, the perceived brightness may be reduced due to the eye glasses. This aspect should be considered in setting the picture brightness for 3D subjective testing. All measurements including screen brightness measurement need to be carried out through glasses according to the 3D display technology.
[Editor’s note: This section needs further studies.]
In general, the viewing distance is about 3H (three times picture height) for TV environments. For PC monitors, 1H to 3H is recommended. For multimedia applications (e.g., mobile devices), 6H to 10H is recommended.
To optimize the 3D viewing environment, some additional details may be necessary, such as suggesting the optimal distance between the display and the back wall and the optimal viewing distance.
8.5 Color temperature of 3D displays
[Editor’s note: This section needs further studies.]
Most 3D monitors use LCD displays. Setting the 3D display to a certain color temperature may not be desirable, because such operations may result in a color shift. In general, factory settings may be used provided that such settings provide a natural color appearance.
[Editor’s note: Further studies required on this topic. For 3D studies, the number of subjects used in the experiment are not different from those for 2D studies..]