Juy996enjavhdtoday12152021015941 Min New [ 2025-2027 ]
Dataset We curated 1,200 short clips (15–120 s) from publicly available Creative Commons sources across categories: news, sports, tutorials, interviews, and user-generated content. Each clip has:
If you are looking for information regarding the production itself, you can find details on industry databases like the Adult Video Database (search for code JUY-996) or official studio listings at Alice Japan . juy996enjavhdtoday12152021015941 min new
Introduction Short-form video content has exploded on social platforms. Users prefer concise summaries highlighting salient moments. Existing summarization approaches often target longer videos and focus on visual features alone. This work proposes a lightweight multi-modal model optimized for clips around one minute in length, combining frame-level visual embeddings, audio features, and automatic speech recognition (ASR) transcripts via a cross-modal attention mechanism. Dataset We curated 1,200 short clips (15–120 s)
for files with similar naming conventions Users prefer concise summaries highlighting salient moments