Video Scene Captioning

Image Content Exploitation | Video Scene Captioning

The Pursuit of a Rapidly Advancing Captioner

line
What is the objective?  
Create a near-real-time scene captioning artificial intelligence algorithm for use in full motion streaming video analysis, automatically capturing unique, anomalous or unexpected context; emphasis is on raw, live or otherwise unanalyzed video.

What problem are we trying to solve?  
Intelligence analysts, vehicular traffic analysts, sports analysts, and other related fields are overloaded with realtime video feeds; they don’t have time to analyze all incoming info due to content overload - important relevant events (criminal events, life-safety events) may be missed. This capability does not currently exist; the majority of all labeling and all captioning tasks are performed by humans manually after events have occurred.

What outcome do we hope to achieve?  
Leverage computer vision/deep learning/deep neural networks/machine learning/natural language generation to identify and interpret scenes of interest and convey them in a user understandable manner - no matter the industry.

What resources could the lab provide?  
Single point of contact for access to highly educated and experienced professionals with decades-long experiences in AI/ML/DL and strategic insights on how to deliver a relevant product; access to high-power/cutting edge computing, and semi-prepared “nearly” labeled data with accompanying software available for assisting creation of video captioning datasets.

What would success look like?  
Ultimate success would be production of an efficient and elegant/intuitive/user friendly software tool that will automatically recognize, analyze, and label video, potentially with bounding boxes, highlights, text captions, and/or text summaries describing used-defined areas of focus in the scene. Threshold success would be addressing a few problematic key research challenges and making notable progress on them.

What types of solutions would we expect?  
A scene captioning/scene summarizing algorithm with defined accuracy metrics and near real-time performance; enabling the enduser to work more efficiently and effectively using packaged open architecture standards, so it could work with a large number of commercial/government apps both in use and in development.

What's in it for industry?  
Innovare enabled pursuits on this topic are an avenue for industry to uncover problems facing today’s technology leaders in the computer vision realm as well as the opportunity to compete on future Air Force solicitations and work alongside government, academia, and fellow industry leading partners in the field.

The Request for Partnership Submission Period Has Now Ended.

POWERED BY GRIFFISS INSTITUTE