Wait a minute, who are you?
A report on the fake news generating software
By Yves Wienecke
January 20, 2019
A deep fake is a type of digital impersonation that takes in video source input
to produce an output video where the face of a subject is replaced with a machine learning
generated face of a different subject. It’s similar to the face swap app popularized on snapchat,
but does a bit more work to make the swap appear more realistic, such matching face angle and
blinking. Take this video of American acress Jennifer Lawrence with her face swapped with actor
Steve Buscemi, for example:
[12] A stunningly realistic video of Jeniffer Lawrence with Steve Buscemi’s face.
The first step is to creating a deepfake is to parse through a database of images of a subject (using google or bing search) and perform a sequence of tranformations on the images to create a basic representation of the image and crop out the faces of each image. This is repeated with another subject so that there are two sets of faces. Then, a two convolutional networks are trained on each respected set. First, the networks use the same type of autoencoder, which transforms the input face into a basic representation of the face. Then, the networks each use their own special decoder to generate the original face as realistically as possible.
After training is complete, an image of the first subject’s face can be encoded by the first convolutional network and then decoded by the second network to generate the second subject’s face. In this way, a basic representation of a face is a transition language, such as translating from hex to binary to decimal, or from java to bytecode to assembly, with binary and bytecode being the transitional languages.
Finally, the convoluted neural network can be given a video of only the first subject, and then can change every single
frame in this input video to generate an output version where the first subject’s face is swapped with the second
subject’s face.
[10] Explanation of how deepfakes work.
Researchers in the field of computer vision have make baffling progress over just the past two years. For me, it
all started in 2017, when I came to know about ‘pix2pix,’ which takes a crude drawing of something like
a cat, face, or shoe, and tries to create a realistic version of the drawing. However, the generated images were
commonly hideous and nightmare-inducing, so it wasn’t particularly realistic [5].
A couple months later, researchers were able to produce a video of Obama ‘saying’ something by syncing lip
movement to an audio input. Why Obama? Well, there is a lot of training data - that is, his weekly adress
videos.
[6] Generated video of Obama ‘saying’ things.
Researchers developed video-to-video generation about a year later, where a video labels various objects
in an input video such that the labels can be turned into various objects, like trees or buildings. This
allowed for creating variations on a video by changing backdrop or time of day.
[7] Video-to-video synthesis.
Fast forward to a few months ago, machine learning could take a 2D input image and mapped the image to
a 3D model, allowing the input image to be manipulated throughout 3D space. This opens up a world of
possibilities for creative inspiration, but at the same time reminds of the power that software has.
[8] Making a 3D model from a 2D image.
With this in mind, the creation of deepfakes does not surprise me. What does surprise me, however, is hard
it is to spot a deepfake - they certainly do a good job at resembling the training input. This point
was brought to the attention of the public after buzzfield released a video of supposedly Obama
giving a statement:
[2] Top comment: “Wow, Barack Obama does a mean Jordan Peele impersonation”
This technology has the potential for causing a lot of harm. Widespread usage of these deepfake video imitations will be an impetus for the proliferation of fake news. Not only will viewers be tricked into believing that a person said something misconstrued or inappropriate, but they will question the validity of legitimate videos. If there is no method for checking a video to see if it is a deepfake or not, then there is no reason for believing what is shown. Take, for instance, a deepfake where a president declares war against a country, or one where a crime suspect confesses to a crime. Both of these deepfakes, if realistic enough, could have very grave consequences [3].
In response to this danger, blockchain technology has been called into scope. As described by Siraj Raval in his (non-deepfake) Youtube video discussing the benefits of using blockchain with artificial intelligence, blockchair represents a deterministic and permanent source for validation while artificial intellicence is a dynamic source of data generated from computer algorithms. The former attempts to record life while the latter tries to mimic life. In this way, blockchain technology can validate whether or not a video is a deepfake [10].
It is important to mention that this technology is nothing groundbreaking or very new. Snapchat, for
instance, uses the same technology for face filters. In fact,
some of the videos posted earlier in this blog are more complicated and impressive than deepfakes.
The reason why this technology is garnering so much attention from news sites and the media is that
deepfakes - unlike other, better technologies - are widely available to the general public at no cost.
Here is a video tutorial on how to do it:
[11] Video tutorial on making deepfakes.
While the aforementioned political consequences of deepfakes are a huge concern, it is not the use case for the general public. Instead of focusing on creating fake news and misrepresenting political figures, the general public is interested in two things:
Are we really surprised by this? Deepfakes have been used to generate videos of celebrities ~and~ everyday, regular people - in compromising situations. Now, people can realize their dreams exploring shrek’s layers with deepfakes! However, creating a deepfake of a person is considered to be nonconsentual pornography, and is a punishable offense in the same category as revenge porn. In the midst of a crackdown on spy cam porn in South Korea, the prospect of not even needing to use spy cams to customize porn is alarming [1].
In the cultural context of South Korea, having such pornography of a person, be it consentual or non-consentual, can destroy that person’s career and social life. For similar reasons, pornhub has declared a ban against deepfakes. Although I have not done any research myself, there are claims that deepfakes are still prevalent on the adult video sharing website in spite of the ban [4].
On the other, safe-for-work side of things, people have been creating deepfakes of Nicholas Cage, effectively pasting him into movies in which he did not play a role! Evidently, there is much meme value in deepfakes.
[Deepfakes] Compilation of Nic Cage deepfakes.
Deepfake technology raises a lot of issues concerning privacy and consent that have never been seen before. How should law authorities and politicians react to this technology? How will people be punished for breaking laws in this area? Will the media being to use deepfakes in movies and televistion? These are all questions to consider looking towards the future of deepfakes and video generation software in general.
[1] BBC News. “Trailblazers: Fighting South Korea’s spy cam porn - BBC News.” Youtube, 3, Dec. 2018, https://www.youtube.com/watch?v=bHNTnaOR_YU.
[2] BuzzFeedVideos. “You Won’t Believe What Obama Says In This Video! 😉.” Youtube, 17 Apr. 2018, https://www.youtube.com/watch?v=cQ54GDm1eL0.
[3] Chesney, Robert, Danielle Citron. “Deep Fakes: A Looming Crisis for National Security, Democracy and Privacy?” Lawfare, 21 Feb. 2018, https://www.lawfareblog.com/deep-fakes-looming-crisis-national-security-democracy-and-privacy.
[4] Clark, Bryan. “Pornhub promised to ban ‘deepfakes’ videos. And it failed miserably.” Insider, 19, Apr. 2018. https://thenextweb.com/insider/2018/04/19/pornhub-promised-to-ban-deepfakes-videos-and-it-failed-miserably/.
[5] Hesse, Christopher. “Image-to-Image Demo.” Affine Layer, 19 Feb. 2017, https://affinelayer.com/pixsrv/.
[6] Two Minute Papers. “Audio To Obama: AI Learns Lip Sync from Audio | Two Minute Papers #194.” Youtube, 4 Oct. 2017, https://www.youtube.com/watch?v=nsuAQcvafCs.
[7] Two Minute Papers. “AI-Based Video-to-Video Synthesis.” Youtube, 9, Sep. 2018, https://www.youtube.com/watch?v=GRQuRcpf5Gc.
[8] Two Minute Papers. “This AI Learns Human Movement From Videos.” Youtube, 6, Dec. 2018, https://www.youtube.com/watch?v=AGm3hF_BlYM.
[9] Raval, Siraj. “Blockchain Consensus Algorithms and Artificial Intelligence.” Youtube, 10 Oct. 2017, https://www.youtube.com/watch?v=5Tr13l0O1Ws.
[10] Ravel, Siraj. “DeepFakes Explained.” Youtube, 2 Feb. 2019, DeepFakes Explained.
[11] Voice, Irrelevant. “DEEPFAKES Tutorial (FakeApp) (Fake adult videos of celebrities).” Youtube, 28, Jan. 2018, https://www.youtube.com/watch?v=ghTb2kZSpZE.
[12] Vecanoi. “Original video + Stennifer Lawrscemi.” Youtube, 31 Jan. 2019, https://www.youtube.com/watch?v=O7JOD-qytb8.