AI-generated video technology has advanced at a staggering pace. What was once easy to spot — robotic faces, garbled text, jittery movement — now passes casual inspection. In 2026, the gap between real and AI-generated footage has narrowed dramatically, making detection a genuinely important skill for journalists, content moderators, researchers, and everyday viewers.
This guide distills the practical knowledge needed to evaluate whether a video is AI-generated or authentic. We present 12 concrete checkpoints, each targeting a specific weakness in how current AI models generate video. Rather than relying on gut feeling, you will learn a systematic, repeatable approach to deepfake detection.
Whether you are verifying a breaking-news clip, reviewing user-generated content, or simply curious about the limits of generative AI, these checkpoints will sharpen your eye. Some techniques take seconds; others require pausing and zooming in. Together, they form a layered defence against deception.
You do not need to check every single item for every video. Start with the highest-reliability checkpoints (hands, text, physics) and escalate only if the result is inconclusive. The Pro Detection Workflow section at the end shows exactly how to prioritise.
12 Checkpoints Quick Reference
The table below summarises all 12 checkpoints at a glance. Click any checkpoint name to jump to its detailed section.
| No. | Checkpoint | What to Check | Detection Reliability ★ | Difficulty |
|---|---|---|---|---|
| 1 | Fine Structures | Hair, eyelashes, fabric weave, jewellery edges | ★★★★☆ | Medium |
| 2 | Hands and Fingers | Finger count, joint angles, palm lines | ★★★★★ | Easy |
| 3 | Shadows and Light Sources | Shadow direction consistency, light-source count | ★★★★☆ | Medium |
| 4 | Text and Logos | Readable text, logo accuracy, letter consistency | ★★★★★ | Easy |
| 5 | Physics of Motion | Gravity, inertia, fluid dynamics, cloth simulation | ★★★★☆ | Medium |
| 6 | Background Semantic Consistency | Logical placement of objects, architectural sense | ★★★☆☆ | Medium |
| 7 | Object/Person Deformation | Identity drift, morphing between frames | ★★★★☆ | Medium |
| 8 | Inter-frame Differences | Temporal flickering, texture pop-in | ★★★★☆ | Hard |
| 9 | Eyes and Pupils | Pupil shape, reflection consistency, blink timing | ★★★★☆ | Medium |
| 10 | Suspiciously Perfect Footage | Absence of sensor noise, lens distortion, motion blur | ★★★☆☆ | Hard |
| 11 | Camera Work | Physically impossible moves, unnatural stabilisation | ★★★☆☆ | Hard |
| 12 | Pause and Inspect | Frame-by-frame scrubbing, zoom to 200 %+ | ★★★★★ | Easy |
The Fundamental Principle — Statistical vs Physical Generation
Before diving into individual checkpoints, it helps to understand why AI-generated videos fail. The core issue is that generative models produce frames statistically — predicting the most likely next pixel — rather than simulating real-world physics. This fundamental gap is what every checkpoint exploits.
| Dimension | Real Video (Physical World) | AI-Generated Video (Statistical Model) |
|---|---|---|
| Generation principle | Light captured by a physical sensor; governed by optics and physics | Pixel values predicted by a neural network trained on large datasets |
| Consistency | Inherently consistent — objects obey the same physical laws across frames | Consistency is only approximate; the model has no persistent world state |
| Detail | Infinite resolution in the real world; sensor is the bottleneck | Detail is bounded by model capacity; fine structures often degrade |
| Temporal coherence | Each frame is a direct continuation of physical reality | Frames are generated sequentially or in batches; drift accumulates over time |
Whenever you are unsure about a specific frame, ask yourself: “Could this plausibly result from a physical camera recording a physical scene?” If the answer is no, you have found an artefact.
① Fine Structures
Fine structures — individual hairs, eyelashes, fabric weave, lace patterns, jewellery edges — are extremely expensive for generative models to render accurately. These high-frequency details are often the first to break down, even in state-of-the-art systems.
| Structure | Anomaly to Watch |
|---|---|
| Hair | Strands merge into a painted texture instead of individual fibres; hairline shifts between frames |
| Eyelashes | Unnatural uniformity; lashes may appear fused or change length mid-blink |
| Fabric weave | Repeating pattern breaks, moiré-like artefacts that shift unnaturally |
| Jewellery / accessories | Edges shimmer or dissolve; gemstone facets flicker; chain links merge |
| Teeth | Count changes between frames; teeth appear blurred or fused together |
| Skin pores | Unnaturally smooth skin at close range or AI-hallucinated pore patterns |
Low-resolution or heavily compressed real video can also lack fine detail. Always consider the stated resolution before concluding that missing detail equals AI generation.
② Hands and Fingers
Hands remain one of the most reliable indicators of AI-generated video. The complex articulation of five fingers with multiple joints, overlapping and foreshortening, is notoriously difficult for generative models.
| Anomaly Pattern | Description |
|---|---|
| Extra or missing fingers | The most classic tell — six fingers, four fingers, or fingers that branch mid-way |
| Impossible joint angles | Fingers bending backwards or at anatomically impossible points |
| Fused fingers | Two or more fingers merging into a single mass, especially in motion |
| Disappearing fingers | Fingers that exist in one frame and vanish in the next |
| Inconsistent palm lines | Palm creases that shift, disappear, or reconfigure between frames |
| Nail anomalies | Fingernails appearing on the wrong side, changing shape, or missing entirely |
Pause the video at any frame where hands are prominent and count the fingers carefully. This single check catches a surprising number of AI-generated clips, even in 2026.
③ Shadows and Light Sources
In the physical world, every shadow has a corresponding light source, and all shadows in a scene are geometrically consistent. AI models frequently fail to maintain this global consistency because they lack a true 3D scene representation.
| Anomaly | What to Look For |
|---|---|
| Contradictory shadow directions | Shadows of different objects pointing in incompatible directions |
| Missing shadows | Objects that should cast a shadow on nearby surfaces but do not |
| Shadow shape mismatch | Shadow outline that does not match the object’s silhouette |
| Inconsistent specular highlights | Reflections on shiny surfaces that imply a different light position than the shadows |
| Flickering shadows | Shadow intensity or direction changing erratically between frames |
Multiple real light sources (e.g., stage lighting) can create genuinely complex shadow patterns. Make sure you are not mistaking multi-light setups for AI artefacts.
④ Text and Logos
Generating readable, consistent text is one of the hardest challenges for video AI models. Letters, numbers, and logos frequently contain errors that are immediately obvious to a literate viewer.
| Anomaly | What to Look For |
|---|---|
| Garbled text | Words that look plausible at a glance but are actually nonsensical letter combinations |
| Shifting text | Letters on a sign or label that change between frames |
| Inconsistent font | Characters within the same word rendered in different typefaces or sizes |
| Logo distortion | Well-known logos with wrong proportions, missing elements, or extra strokes |
| Mirrored or inverted text | Text that reads backwards or is partially flipped |
| Disappearing text | Text visible in one frame that vanishes or transforms in the next |
Zoom into any visible text — street signs, T-shirt prints, book covers, product labels. If you can read it clearly and it makes perfect sense across multiple frames, that is a strong signal the footage is real.
⑤ Physics of Motion
Real-world motion obeys Newton’s laws: gravity pulls objects downward at 9.8 m/s², inertia resists changes in velocity, and fluids flow according to well-known dynamics. AI models approximate these patterns statistically but frequently produce physically impossible results.
| Physics Domain | Anomaly to Watch |
|---|---|
| Gravity | Objects falling too slowly, too quickly, or pausing mid-air unnaturally |
| Inertia / momentum | Moving objects stopping instantly or changing direction without deceleration |
| Fluid dynamics | Water, smoke, or fire behaving in visually appealing but physically wrong ways |
| Cloth simulation | Fabric clipping through the body, folding in impossible patterns, or moving without wind |
| Collision response | Objects passing through each other or reacting to collisions inconsistently |
| Weight and impact | Heavy objects bouncing like rubber or light objects moving as if leaden |
Stylised or slow-motion footage can look physically unusual even when it is real. Consider the context and whether the video is intended to be cinematic before flagging physics anomalies.
⑥ Background Semantic Consistency
While AI models excel at generating visually plausible backgrounds, they often fail at semantic consistency — ensuring that objects in the background make logical sense in relation to each other and the setting.
| Anomaly | What to Look For |
|---|---|
| Impossible architecture | Buildings with non-functional doors, windows that lead nowhere, stairs that loop |
| Semantic mismatch | Objects that do not belong in the scene (e.g., a fire hydrant indoors, tropical plants in a snow scene) |
| Floating objects | Background items that are not anchored to any surface |
| Inconsistent scale | Objects in the background that are disproportionately large or small relative to their surroundings |
| Morphing background | Background elements that subtly change shape or position as the camera moves |
Intentionally shift your focus away from the main subject and study only the background. AI models allocate most of their capacity to the foreground, so background anomalies are often more pronounced.
⑦ Object/Person Deformation — Identity Drift
Identity drift occurs when a person’s or object’s appearance gradually changes over the course of a video. Because AI models lack a persistent 3D model of each entity, features can morph subtly — or dramatically — between frames.
| Anomaly | What to Look For |
|---|---|
| Facial feature drift | Nose shape, jaw line, or ear position changing gradually over a few seconds |
| Clothing transformation | Garment colour, pattern, or style shifting mid-clip |
| Accessory inconsistency | Glasses, earrings, or hats appearing, disappearing, or changing design |
| Body proportion shift | Shoulder width, limb length, or torso ratio changing between shots |
| Object morphing | Inanimate objects (cars, furniture) subtly changing shape over time |
Genuine videos with multiple camera angles can show different perspectives of the same face, which may look like “drift” at first glance. Compare the same angle across time, not different angles at different times.
⑧ Inter-frame Differences — Temporal Flickering
Temporal flickering is a hallmark of AI video. Because each frame is generated semi-independently, small inconsistencies accumulate and manifest as rapid changes in texture, colour, or shape that would not occur in optically captured footage.
| Anomaly | What to Look For |
|---|---|
| Texture flickering | Surface textures (skin, fabric, walls) that shimmer or shift rapidly between frames |
| Colour banding | Sudden shifts in colour tone that ripple across the image |
| Edge instability | Object outlines that vibrate or jitter even when the subject is stationary |
| Detail pop-in | Fine details that appear and disappear from frame to frame |
| Ghosting artefacts | Faint remnants of objects or features from adjacent frames bleeding through |
Slow the playback speed to 0.25× and watch a fixed region of the frame. Temporal flickering that is invisible at normal speed becomes glaringly obvious in slow motion.
⑨ Eyes and Pupils
The eyes are among the most scrutinised features in deepfake detection. Pupil shape, reflection patterns, and blink timing all carry strong signals of authenticity — or the lack thereof.
| Anomaly | What to Look For |
|---|---|
| Asymmetric pupils | Pupils of different sizes or shapes that are not explained by medical conditions or lighting |
| Inconsistent reflections | The reflection in the left eye showing a different scene or light source than the right |
| Non-circular pupils | Pupils that are oval, irregular, or have rough edges |
| Abnormal blink rate | Blinking too rarely, too frequently, or both eyes not blinking simultaneously |
| Iris detail loss | Iris patterns that are blurry, symmetric, or lack the natural randomness of real irises |
Eye reflections in real video can also be asymmetric if the person is near a window or a complex light source. Use this checkpoint alongside others rather than in isolation.
⑩ Suspiciously Perfect Footage
Real cameras introduce imperfections: sensor noise in low light, lens distortion at wide angles, motion blur on fast-moving subjects. AI-generated video often lacks these natural artefacts, resulting in footage that looks “too clean.”
| Missing Imperfection | What to Look For |
|---|---|
| Sensor noise | Uniformly clean image even in low-light scenes where real cameras would produce grain |
| Lens distortion | Perfectly straight lines at the frame edges where barrel distortion would normally appear |
| Motion blur | Fast-moving objects rendered in perfect sharpness without any directional blur |
| Depth of field | Entire scene in focus when a real lens would produce bokeh at that focal length |
| Chromatic aberration | Absence of colour fringing at high-contrast edges, which real lenses typically produce |
If a video looks like it was shot on a “perfect” camera that does not exist — no noise, no distortion, no aberration — treat that very perfection as a red flag.
⑪ Camera Work
AI-generated camera movements often betray their synthetic origin. Real cameras have physical constraints — they sit on tripods, are handheld by humans, or are mounted on drones — and each introduces characteristic motion patterns.
| Anomaly | What to Look For |
|---|---|
| Impossible trajectories | Camera paths that would require passing through walls or solid objects |
| Unnaturally smooth movement | Gliding motion with zero vibration — even gimbal-stabilised footage has subtle shake |
| Scale inconsistency during zoom | Objects changing relative size in ways inconsistent with optical zoom |
| Parallax errors | Foreground and background not shifting correctly as the camera moves laterally |
| No rolling shutter effect | Fast panning without the skewing that CMOS sensors typically produce |
High-end cinema cameras with global shutters and advanced stabilisation can produce very smooth footage. Consider the alleged source of the video before concluding the camera work is AI-generated.
⑫ Pause and Inspect (Most Important Technique)
The single most powerful technique for detecting AI-generated video requires no specialised tools: pause the video and zoom in. AI artefacts that are invisible at normal playback speed and resolution become unmistakable when you freeze a frame and enlarge it to 200 % or more.
This works because our brains are optimised for motion perception — we instinctively track movement and miss static details. When you pause, you switch from motion-processing mode to detail-processing mode, and artefacts leap out.
Frame-by-frame scrubbing is particularly effective for catching temporal anomalies. Use your video player’s arrow keys or frame-advance feature to step through suspicious sections one frame at a time. Look for sudden changes in detail, identity drift, and texture flickering.
On most video players, pressing the period key (.) advances one frame forward and the comma key (,) goes back one frame. Use this to scrub through suspicious moments methodically.
Video compression (especially at low bitrates) creates its own artefacts — blocky regions, colour banding, and blurred edges. Learn to distinguish compression artefacts from AI generation artefacts; the former tend to be blocky and uniform, while the latter are organic and inconsistent.
Pro Detection Workflow
Experienced fact-checkers do not check all 12 points in order. They follow a priority-based workflow that maximises detection accuracy while minimising time spent. Here is the recommended approach:
| Priority | Checkpoint | Reason | Approx. Time |
|---|---|---|---|
| 1 | ④ Text and Logos | Near-instant check — if text is garbled, the case is closed | 5 seconds |
| 2 | ② Hands and Fingers | Still the single most reliable structural tell in 2026 | 10 seconds |
| 3 | ⑫ Pause and Inspect | Reveals artefacts invisible during playback | 30 seconds |
| 4 | ⑤ Physics of Motion | Gravity and inertia errors are conclusive when present | 15 seconds |
| 5 | ③ Shadows and Light Sources | Global illumination consistency is hard for AI to fake | 15 seconds |
| 6 | ⑧ Inter-frame Differences | Slow-motion playback catches temporal artefacts | 30 seconds |
| 7 | ① Fine Structures | Zoom into hair, fabric, and jewellery for detail loss | 20 seconds |
| 8 | ⑨ Eyes and Pupils | Check pupil symmetry and reflection consistency | 10 seconds |
| 9 | ⑦ Object/Person Deformation | Identity drift becomes visible in longer clips | 20 seconds |
| 10 | ⑥ Background Consistency | Look for semantic errors in the environment | 15 seconds |
| 11 | ⑩ Suspiciously Perfect Footage | Absence of natural imperfections | 10 seconds |
| 12 | ⑪ Camera Work | Check for impossible camera trajectories | 10 seconds |
In practice, most AI-generated videos will fail within the first three checks (text, hands, pause-and-zoom). If a video passes all 12 checks, you are dealing with either a real video or an exceptionally sophisticated fake — at which point, reach for automated detection tools.
Why AI Videos Break Down — Technical Background
Understanding the technical reasons behind AI video failures makes you a better detector. There are three fundamental gaps that current models have not fully bridged.
The Physics Gap
Current video generation models — whether based on diffusion, autoregressive transformers, or hybrid architectures — do not simulate physics. They learn statistical correlations from training data: “when an object is released, it tends to move downward.” But they do not compute gravitational acceleration, air resistance, or elastic collisions. This means they can produce plausible-looking motion for common scenarios while failing spectacularly on edge cases.
For example, a ball dropping straight down may look correct, but a ball bouncing off an angled surface will often follow an impossible trajectory because the model has not learned the law of reflection — only an approximation of what bouncing “usually looks like.”
Temporal Consistency Limits
Video generation models typically process a limited number of frames at once — often 16 to 64 frames in a single generation window. For longer videos, they must stitch together multiple windows, leading to subtle or not-so-subtle discontinuities at the boundaries. Even within a single window, the model lacks a persistent world state. It cannot “remember” that a character had five fingers in frame 1 and enforce that constraint in frame 48.
This is fundamentally different from reality, where temporal consistency is guaranteed by the laws of physics — an object cannot spontaneously change shape between one millisecond and the next.
The Structural Understanding Gap
Humans understand that a hand has five fingers, each with three joints, connected to a palm. We know that text is composed of specific characters arranged in a meaningful order. AI models do not possess this structural knowledge explicitly — they learn it implicitly from pixel patterns. This means they can generate a convincing hand at a glance, but when pressed for detail, the underlying lack of structural understanding becomes apparent.
This gap is particularly stark for text generation. A model might learn that “EXIT” signs are common above doors, but it has no character-level language model to ensure the letters are correct — it is simply painting pixels that look like they could be text.
Will AI Videos Become Undetectable in the Future?
This is the question everyone asks, and the honest answer is nuanced. AI video quality is improving rapidly, and some artefacts that were obvious in 2024 are now rare in 2026. Let us consider both sides.
Factors That Are Making Detection Harder
Model architectures are scaling up, with larger transformer-based models generating higher-resolution, longer-duration videos. Physics-aware training techniques are closing the motion-plausibility gap. Fine-tuning on specific domains (faces, nature, urban scenes) is eliminating many domain-specific artefacts. And post-processing pipelines can now apply realistic sensor noise, lens distortion, and compression artefacts to AI-generated footage, removing the “too perfect” signal.
Why Complete Undetectability Remains Unlikely
Despite these advances, several factors suggest that AI video will remain detectable for the foreseeable future. First, the computational cost of truly physics-accurate generation is enormous — real-time ray tracing for a single frame is expensive, let alone generating thousands of physically consistent frames. Second, structural understanding (text, hands, complex mechanical objects) requires explicit reasoning that current architectures handle poorly. Third, as AI generators improve, so do AI detectors — there is a continuing arms race where detection methods keep pace with generation improvements.
Most importantly, the human eye remains remarkably good at spotting “something off” even when it cannot articulate what. Training your visual intuition through the checkpoints in this guide gives you a lasting advantage, even as the specific artefacts evolve.
Stay updated with the latest AI video models and their known weaknesses. Detection is not a one-time skill — it is an ongoing practice. Follow our LLM model size guide and AI prompt design guide to keep your knowledge current.
AI Video Detection Tools and Services
While manual inspection is essential, automated tools can provide an additional layer of confidence. Here is an overview of the current detection landscape:
| Category | Overview | Examples |
|---|---|---|
| Browser-based detectors | Upload a video and receive a probability score. Easy to use but accuracy varies by model. | Sensity AI, Deepware Scanner, AI or Not |
| Forensic analysis suites | Professional tools that perform metadata analysis, error-level analysis (ELA), and frame-level inspection. | FotoForensics, Amped Authenticate, Griffeye |
| Open-source models | Research-grade detection models you can run locally. Require technical setup but offer transparency. | Microsoft Video Authenticator (research), DFDC models, DeepfakeBench |
| Blockchain / provenance | Content authenticity initiatives that embed cryptographic provenance data at capture time. | C2PA (Coalition for Content Provenance and Authenticity), Adobe Content Credentials |
| Social media platform tools | Built-in labels and detection systems on major platforms. | YouTube synthetic media labels, Meta AI-generated content labels, TikTok AI label |
No single automated tool is 100 % accurate. Treat tool outputs as one data point among many, and always combine them with manual inspection using the checkpoints in this guide.
Quick 5-Step Detection Method
When you need a fast answer and cannot run through all 12 checkpoints, use this condensed 5-step method:
| Step | Action | What to Check |
|---|---|---|
| 1 | Read the Text | Zoom into any visible text or logos — garbled text is the fastest tell |
| 2 | Count the Fingers | Pause on any frame with visible hands and count fingers on each hand |
| 3 | Pause and Zoom | Freeze on a detail-rich frame and zoom to 200 %+ — look for texture breakdown |
| 4 | Watch in Slow Motion | Play at 0.25× speed and look for flickering, morphing, or physics violations |
| 5 | Check the Shadows | Verify that all shadows point in a consistent direction from a plausible light source |
These five steps can be completed in under 60 seconds and will catch the vast majority of AI-generated videos in circulation as of 2026.
Frequently Asked Questions
Can AI-generated videos be detected with 100 % certainty?
No single technique guarantees 100 % detection. However, combining multiple checkpoints from this guide dramatically increases your accuracy. In practice, the layered approach described in the Pro Detection Workflow catches the overwhelming majority of current AI-generated videos. For high-stakes situations, supplement manual checks with automated detection tools and metadata analysis.
How long does it take to verify a video?
Using the Quick 5-Step Method, you can reach an initial assessment in under 60 seconds. A thorough analysis using all 12 checkpoints typically takes 3–5 minutes. For professional forensic analysis with automated tools, allow 15–30 minutes depending on the video length and complexity.
Do these techniques work on face-swap deepfakes as well as fully generated videos?
Yes, with some differences. Face-swap deepfakes replace only the face region, so background and body checks are less useful — focus instead on the boundary between the swapped face and the original neck/hair, inconsistent lighting on the face versus the body, and eye reflection mismatches. Fully generated videos are vulnerable to all 12 checkpoints.
Are AI-generated audio deepfakes covered here?
This guide focuses on visual detection. Audio deepfakes — cloned voices, synthetic speech — require a different set of techniques, including spectral analysis, prosody evaluation, and phoneme-level inspection. However, audio-visual mismatch (lip movements not matching speech) is a visual cue that you can check using the Pause and Inspect technique.
What should I do if I find a deepfake in the wild?
First, do not share or amplify the video. Report it to the platform where you found it using their deepfake / synthetic media reporting mechanism. If the deepfake targets a specific individual, inform them if possible. For deepfakes related to news events or elections, contact fact-checking organisations in your region. Document your detection evidence (screenshots, specific frame numbers, anomalies found) in case it is needed for further investigation.
Conclusion
AI video generation technology will continue to improve, but so will your ability to detect it — if you practice. The 12 checkpoints in this guide target fundamental weaknesses in how AI models generate video: the physics gap, the temporal consistency problem, and the structural understanding deficit. These are not superficial bugs that will be patched away; they are deep architectural limitations.
Start with the Quick 5-Step Method for everyday use, graduate to the full 12-checkpoint analysis when the stakes are high, and supplement with automated tools when available. The more you practise, the faster and more accurate your detection becomes.
The battle between AI generation and AI detection is an ongoing arms race, but an informed human viewer remains the most versatile detector. Stay curious, stay sceptical, and keep your checkpoints sharp.
Related Articles
Deepen your understanding of AI with these related guides:
👉 Understanding LLM Model Sizes — A Practical Guide
👉 AI Prompt Design Guide — Write Better Prompts, Get Better Results

Leave a Reply