Multimodal News Reporter

Upload an audio recording and/or a relevant image; the AI will generate a news report you can revise and save.
Token output is set to 128 only for faster inference.
Note: This demo currently runs on CPU only.
Sample audio is trimmed to 10 seconds for faster inference.
Combined audio + image inference takes ~250-350 seconds; audio-only or image-only is much faster.

📰 Multimodal News Reporter AI