Extraction of Important Scenes by Multimodal LLM Using Video and Speech Transcription Data
-- A Study on the Accurate Understanding of Timestamp Information -- Tomoki Haruyama, Cheng Zhou (NTT DOCOMO)
(To be available after the conference date) [more]