We asked Gemini 2.5 Flash to use everything it knows (including the latest research and common limitations of current generative AI), to tell us how to spot deepfakes that are too good for the human eye to detect.
Gemini had a 3-second think about things that then said that the giveaways often lie in subtle, systemic inconsistencies in physiological and environmental details that betray a lack of genuine understanding of physics and human biology.
Here’s its findings:
- Physiological Anomalies (Eyes & Mouth are key):
- Eyes (Reflections, Pupils, Blinking): even when a face looks incredibly real, AI often struggles with the nuanced, consistent reflections in both eyeballs, the perfect circularity/ellipticity of pupils, and realistic, spontaneous blinking patterns (e.g., blinking too little, too much, or unnaturally synchronously). The human visual system is highly attuned to these subtle facial cues.
- Mouth/Teeth: while lip-syncing has improved, the full range of natural mouth movements during speech, especially the appearance of the tongue and the specific shapes of teeth, can still be inconsistent or slightly “off” in high-fidelity deepfakes.
- Environmental & Physical Inconsistencies (Beyond the Face):
- Lighting and Shadows: this is a major challenge for AI. Deepfakes often struggle to maintain perfectly consistent lighting and shadow interactions across the entire scene, especially when the subject moves or the light source is complex. You might see subtle discrepancies in how light falls on the face versus the rest of the body or background.
- Hair & Background Artifacts: the edges of hair can sometimes appear unnaturally blurry, “shimmery,” or have a slightly “pasted on” look. Backgrounds may contain subtle distortions, flickers, or lack consistent detail and movement, particularly around the edges of the manipulated subject.
- Micro-expressions and Body Language: AI models are getting better at generating major expressions, but they still struggle to perfectly replicate the incredibly subtle, fleeting micro-expressions and natural, fluid body language that humans unconsciously display. A deepfake might look good in a static frame but feel “off” in motion because these minute details are missing or unnatural.
The reason these are often the “last bastions” of detection for advanced deepfakes is that generating them requires not just replicating pixels, but accurately simulating complex real-world physics, biological processes, and nuanced human behaviour – something current generative AI still finds challenging. Dedicated AI detection tools are trained to spot these specific, often microscopic, anomalies that are invisible to the naked eye.