Means the value of opinion (MOS) is a measure used in the domain of Quality of Experience and telecommunication engineering, representing the overall quality of the stimulus or system. It is the arithmetic average of all "individual values ââon a standardized scale given the subject for his opinion on the quality performance of the system". Such rankings are usually collected in subjective quality evaluation tests, but can also be estimated algorithmically.
MOS is a common measure used for video, audio, and audiovisual quality evaluation, but not limited to the modalities. ITU-T has defined several ways of referring to MOS in Recommendation P.800.1, depending on whether the score is derived from an audiovisual quality test, a conversation, listening, speaking or video.
Video Mean opinion score
Rating scale and mathematical definition
MOS is expressed as a single rational number, usually in the range 1-5, where 1 is the lowest quality, and 5 is the highest quality. Other MOS ranges are also possible, depending on the scale of ratings that have been used in the underlying test. The Absolute Category Rating Scale is very commonly used, which maps the rankings between Poor and Extraordinary to numbers between 1 and 5, as seen in the table below.
Other standardized quality assessment scales are in ITU-T recommendations (such as P.800 or P.910). For example, one can use continuous scales ranging from 1-100. The scale used depends on the purpose of the test. In certain contexts there is no statistically significant difference between ranks for the same stimuli when they are obtained using different scales.
MOS dihitung sebagai mean arithmetic goes peringkat tunggal yang dilakukan oleh subyek manusia untuk stimulus yang diberikan dalam tes evaluasi kualitas subjektif. Demy:
Di mana adalah peringkat individu untuk stimulus yang diberikan oleh subjek.
Maps Mean opinion score
Properti dari MOS
MOS is subject to certain mathematical properties and biases. In general, there is an ongoing debate about the usefulness of MOS to measure the Quality of Experience in a single scalar value.
When MOS is obtained using a category rating scale, it is based on - similar to the Likert scale - the ordinal scale. In this case, the scale of the scale item is known, but the interval is not. It is, therefore, not mathematically correct to calculate the average of individual rankings for central tendencies; median should be used instead. However, in practice and in the definition of MOS, it is considered acceptable to calculate the arithmetic mean.
It has been shown that for categorical rank scales (such as ACRs), individual items are not felt equally distant by the subject. For example, there may be a bigger "gap" between Good and Fair than between Both and Extraordinary . The perceived distance may also depend on the language in which the scale is translated. However, there are studies that can not prove the significant impact of the scale translation on the results obtained.
Some of the other biases present in the way MOS ratings are usually obtained. In addition to the above mentioned problems with non-linear perceived scales, there is the so-called "range of equity bias": subjects, during a subjective experiment, tend to provide scores that extend across the assessment scale. This makes it impossible to compare two different subjective tests if the quality range presented is different. In other words, the MOS is never an absolute measure of quality, but only relative to the test in which it has been obtained.
For the above reasons - and because of some other contextual factors that affect perceived quality in subjective tests - MOS values ââshould only be reported if the context in which the values ââhave been collected is known and reported as well. The MOS values ââcollected from different test templates and designs should therefore not be directly compared. Recommendation ITU-T P.800.2 regulates how MOS values ââshould be reported. Specifically, P.800.2 says:
does not mean it directly compares the MOS values ââgenerated from separate experiments, unless the experiment is explicitly designed to be compared, and even then the data should be statistically analyzed to ensure that the comparison is valid.
MOS for voice estimate and sound quality
MOS historically comes from a subjective measurement in which the listener will sit in a "quiet room" and score the quality of a phone call as they feel. Such test methodologies have been used in the telephone industry for decades and are standardized in ITU-T P.800 recommendations. This specifies that "the speaker should sit in a quiet room with a volume between 30 and 120 dB and a humming time of less than 500 ms (preferably in the range of 200-300 ms.) Room noise levels should be below 30 dBA without a dominant peak in the spectrum." Requirements for other modalities are also specified in ITU's later recommendations.
MOS estimation using quality model
Getting a MOS rating may be time-consuming and costly as it requires the recruitment of human appraisers. For various use cases such as codec development or service quality monitoring purposes - where quality should be forecasted repeatedly and automatically - MOS scores can also be predicted by an objective quality model, which has usually been developed and trained using human MOS ratings.
See also
- Absolute Category Rating
- Likert Scale
- MUSHRA (Recommendation ITU-R BS.1534)
- Objective video quality
- Subjective video quality
References
Source of the article : Wikipedia