The multimodal emotion analysis outputs

The Digital Emotions system generates multimodal emotion analysis results from three modalities: visual-based, speech-based and text/language-based emotion analyses.

Both the single modality based results and multiple modality based results are provided as system outputs.

For more about the conceptual meaning of the outputs variables, please visit the Sciences Behind section.

Visual (only)-based emotion analysis

The following describes the output codebook for outputs from visual-based (i.e., facial expression features) emotion analysis.

  1. vValence is a continuous variable ranging from -1 to 1, where -1 indicates that the facial expressions express the most unpleasant feelings, and 1 indicates the facial expressions express most pleasant feelings

  2. vArousal is a continuous variable ranging from -1 to 1, where -1 indicates that the facial expressions express no physiological activity, and 1 indicates the facial expressions express the highest degree of physiological activity

  3. vIntensity is a continuous variable ranging from -1 to 1, where -1 indicates that the facial expressions express the least intense feelings, and 1 indicates the facial expressions express the most intense feelings

  4. vCategory is a categorical variable predicting one out of the following 25 emotional or emotion-related states: neutral, afraid, alarmed, annoyed, aroused, astonished, bored, calm, content, delighted, depressed, distressed, droopy, excited, frustrated, gloomy, happy, miserable, pleased, sad, satisfied, serene, sleepy, tensed, and tired

Text (only)-based emotion analysis

The following describes the output codebook for the five emotion intensity scores from text-based (i.e., semantic level meaning of the words and the linguistic context of the expression) emotion analysis.

  1. tFear is a continuous variable ranging from 0 to 1, where 0 indicates that this text does not express the fear emotion at all, and 1 indicates that this text expresses an extremely high intensity of the fear emotion.

  2. tAnger is a continuous variable ranging from 0 to 1, where 0 indicates that this text does not express the anger emotion at all, and 1 indicates that this text expresses an extremely high intensity of the anger emotion.

  3. tJoy is a continuous variable ranging from 0 to 1, where 0 indicates that this text does not express the joy emotion at all, and 1 indicates that this text expresses an extremely high intensity of the joy emotion.

  4. tSadness is a continuous variable ranging from 0 to 1, where 0 indicates that this text does not express the sadness emotion at all, and 1 indicates that this text expresses an extremely high intensity of the sadness emotion.

  5. tValence is a continuous variable ranging from 0 to 1, where 0 indicates that this text expresses extremely negative or unpleasant feelings, and 1 indicates that this text expresses extremely positive or pleasant feelings.

Speech (only)-based emotion analysis

There are two language-specific speech-based emotion analysis variations.

The following describes the output codebook for the three emotion classification scores from English speech-based (i.e., tonal, acoustic features of a speech) emotion analysis.

  1. aFear is a binary variable indicating whether or not the audio expresses fear [0: no; 1: yes]

  2. aAnger is a binary variable indicating whether or not the audio expresses anger [0: no; 1: yes]

  3. aHappiness is a binary variable indicating whether or not the audio expresses happiness [0: no; 1: yes]

  4. aSadness is a binary variable indicating whether or not the audio expresses sadness [0: no; 1: yes]

  5. aNeutral is a binary variable indicating whether or not the audio expresses neutral [0: no; 1: yes]

  6. aValence is a continuous variable indicating the degree to which the audio expresses [-1: most unpleasant feelings; 1: most pleasant feelings]

  7. aArousal is a continuous variable indicating the degree to which the audio has [-1: no heightened physiological activity; 1: highly heightened physiological activity]

  8. aPower is a continuous variable indicating the degree to which the audio expresses [-1: least sense of power; 1: most sense of power]

  9. aExpenctancy is a continuous variable indicating the degree to which the audio expresses [-1: least sense of expectancy; 1: most sense of expectancy]

  10. aIntensity is a continuous variable indicating the degree to which the audio expresses [0: least intense feelings; 1: most intense feelings]

The following describes the output codebook for the three emotion classification scores from Chinese speech-based (i.e., tonal, acoustic features of a speech) emotion analysis.

  1. sAnger is a continuous variable ranging from 0 to 1, where 0 indicates that this speaker is extremely unlikely to express any anger, and 1 indicates that this speaker is highly likely to express anger.

  2. sJoy is a continuous variable ranging from 0 to 1, where 0 indicates that this speaker is extremely unlikely to express any joy, and 1 indicates that this speaker is highly likely to express joy.

  3. sSadness is a continuous variable ranging from 0 to 1, where 0 indicates that this speaker is extremely unlikely to express any sadness, and 1 indicates that this speaker is highly likely to express sadness.

Multimodal (combing visual, speech and text-based analysis features) emotion analysis summary

The following describes the output codebook for outputs from multimodal features-based emotion analysis. The summary variable is also re-trainable and customizable given application domain-specific needs and training data.

1. emotionSummary is a summary variable that describes the predictive dominant emotion type (fear, anger, joy, sadness, no specific emotion) and its related intensity level (low, moderate, high, extremely high intensity) in a qualitative sense. For example: “Overall, the person in this video is expressing high-intensity Joy.”