Adoption of Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) Technical Specification Multimodal Conversation Version 1.1

Last updated: 7 Jan 2025

Development Stage

Pre-draft

Draft

Published

3 Dec 2022

Scope

Multimodal Conversation (MPAI-MMC) is an MPAI Standard comprising five Use Cases, all sharing the use of artificial intelligence (AI) to enable a form of human-machine conversation that emulates human-human conversation in completeness and intensity:

1. “Conversation with Emotion” (CWE), supporting audio-visual conversation with a machine impersonated by a synthetic voice and an animated face.
2. “Multimodal Question Answering” (MQA), supporting request for information about a displayed object.
3. Three Uses Cases supporting conversational translation applications. In each Use Case, users can specify whether speech or text is used as input and, if it is speech, whether their speech features are preserved in the interpreted speech:
a. “Unidirectional Speech Translation” (UST).
b. “Bidirectional Speech Translation” (BST).
c. “One-to-Many Speech Translation” (MST). ©IEEE 2022. All rights reserved.

External Links

More information

[site_reviews_summary assigned_posts=”post_id” hide=”bars,if_empty” text=”{rating} out of {max} stars ({num} reviews)”]

Let the community know

Categorisation

Domain: Media, arts, entertainment and publishing

Scope: AI-specific

Application: Computer vision - Human-machine conversation, NLP

Type: Foundational and terminology

Key Information

Organisation: IEEE

Share on X Share on LinkedIn

Discussion

[check_original_title]

Content Type

Adoption of Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) Technical Specification Multimodal Conversation Version 1.1

Development Stage

Pre-draft

Draft

Published

Scope

External Links

Let the community know

Categorisation

Key Information

Discussion

Report abuse

Report submitted

Provide feedback on the site

Feedback submitted

Submit a missing item

Feedback submitted