Tackling the Avalanche of UGC Videos: Harnessing AI to Streamline Video Review Processes
In the digital age, the influx of User Generated Content (UGC) has grown exponentially. On our platform at Hypergro, we grapple with thousands of video submissions daily. While UGC is a rich mine of content, it poses significant challenges in curation and compliance.

The Daunting Challenge: Manual Reviews
With such a massive influx of videos, ensuring each content aligns with our platform's guidelines is paramount. Two major stipulations stand out:
- Mandatory Face Display: For reasons ranging from user verification to engagement metrics, it's essential that faces are visible in these UGC videos.
- Brand Disclaimers: Certain content, especially those affiliated with brands, requires disclaimers to be appended at the end, ensuring transparency and adherence to advertising standards.
However, manually scrutinizing each video is a mammoth task, both time-consuming and prone to human error. Imagine sifting through endless hours of footage daily, trying to spot faces and ensuring disclaimers are in place. The sheer scale makes it an untenable solution.
The AI Revolution: Face and Disclaimer Detection
To combat these challenges, we turned to AI, particularly focusing on face-detection and disclaimer detection:
- Face Detection: By employing state-of-the-art AI models, we can automatically verify the presence of faces in the UGC videos. This not only accelerates the review process but also ensures a higher degree of accuracy.
- Disclaimer Detection: Similarly, AI-driven processes sift through the videos, detecting and appending necessary disclaimers where required, ensuring brand compliance and user transparency.
The integration of these AI models has drastically reduced the manual intervention required, optimizing our workflow and ensuring a more consistent review process.

Diving Deeper: The Tech Behind the Magic
At the heart of this transformation lie two sophisticated algorithms:
- BlazeFace for Face-Detection: Utilizing convolutional neural networks (CNNs), BlazeFace offers real-time facial detection capabilities, adept at identifying faces even in dynamically varied video content.
- MobileNet for Disclaimer Detection: Relying on depthwise separable convolutions, MobileNet provides both speed and precision, making it invaluable for quickly spotting and affixing disclaimers in UGC videos.
A Deeper Look: BlazeFace & MobileNet in Action
Modern AI techniques, particularly in the realm of deep learning, have unlocked capabilities that were previously deemed unattainable. Two such marvels, pivotal to our operations at Hypergro, are BlazeFace and MobileNet.
BlazeFace: The Facial Detection Maestro
BlazeFace is a pioneering model that operates in the realm of face detection, specially optimized for real-time scenarios on mobile devices. Here’s how it excels:
- Convolutional Neural Networks (CNNs): At its core, BlazeFace harnesses the power of CNNs. These are multilayered artificial neural networks designed to recognize patterns from images. CNNs can automatically and adaptively learn spatial hierarchies of features, making them adept at image classification and object detection.
- Simplified Architecture for Speed: BlazeFace uses a more streamlined architecture, which reduces computational complexity. This is vital for real-time applications, especially on devices with limited computational capacity like smartphones.
- Anchors and Pyramids: The model uses an approach where it deploys different sized "anchors" or reference boxes at multiple scales (akin to a pyramid structure) within the image. This multi-scale approach ensures that faces of various sizes and orientations get detected efficiently.
MobileNet: Precision with Agility
MobileNet is a versatile model used for a plethora of vision applications, including our specific need for disclaimer detection. Its strength lies in its balance between efficiency and accuracy:
- Depthwise Separable Convolutions: Traditional CNNs involve a lot of computational overhead. MobileNet’s genius is in using depthwise separable convolutions. Essentially, this breaks down the traditional convolution into two parts: a depthwise convolution followed by a pointwise convolution. This results in reduced computations without a significant drop in accuracy.
- Optimized for Mobile: As its name implies, MobileNet is designed for mobile and embedded vision applications. It’s lightweight, meaning it requires fewer parameters and less computation, making it ideal for devices with limited processing power.
- Flexible and Scalable: One of MobileNet's key features is its ability to be adapted based on the requirements. If a device has more computational resources, MobileNet can be scaled up to increase accuracy. Conversely, for more constrained environments, it can be scaled down.
Conclusion: Marrying Scale with Efficiency
At Hypergro, our journey embodies the vision of seamlessly merging scale with efficiency. By leveraging cutting-edge AI solutions, we've transformed what was once a daunting challenge into a streamlined, efficient process. As we continue to grow and embrace more content, we're confident that our tech-driven approach will keep us at the forefront of the digital content arena.
Accelerator - India's top 20 AI startups