Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

Chris Remington@beehaw.org · 10 months ago

Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

natural_motions@lemmynsfw.com · edit-2 10 months ago

deleted by creator

Handles@leminal.space · 10 months ago

What can possibly go wrong?

casmael@lemm.ee · 10 months ago

Why would you develop this technology I simply don’t understand. All involved should be sent to jail. What the fuck.

Even_Adder@lemmy.dbzer0.com · 10 months ago

They worded the headline that way to scare you into that reaction. They’re only interested in telling you about the negative uses because that drives engagement.

BolexForSoup@kbin.social · edit-2 10 months ago

I understand AI evangelists - which you may or may not be idk - look down on us Luddites who have the gall to ask questions, but you seriously can’t see any potential issue with this technology without some sort of restrictions in place?

You can’t see why people are a little hesitant in an era where massive international corporations are endlessly scraping anything and everything on the Internet to dump into LLM’s et al to use against us to make an extra dollar?

You can’t see why people are worried about governments and otherwise bad actors having access to this technology at scale?

I don’t think these people should be locked up or all AI usage banned. But there is definitely a middle ground between absolute prohibition and no restrictions at all.

Even_Adder@lemmy.dbzer0.com · 10 months ago

This is unnecessarily aggressive, I don’t need this today.

BolexForSoup@kbin.social · edit-2 10 months ago

And your comment was unnecessarily patronizing IMO. Do you think they needed that today?

If you don’t want people to respond to your takes then don’t post them in public forums. I am critiquing your stance. If it’s overly aggressive than I apologize for the tone.

Even_Adder@lemmy.dbzer0.com · 10 months ago

I saw what you wrote before your edits. I’m not going to engage with people who talk like that. Good day.

BolexForSoup@kbin.social · edit-2 10 months ago

I can’t control that you saw my comments seconds after they were posted but before the 20-30s it takes for me to edit them. There is nothing i changed that drastically for you to imply I was being deceptive.

Have a good one.

algorithmae@lemmy.sdf.org · 10 months ago

That was one of the tamest comments I’ve seen on the internet. There wasn’t even remarks about your mom.

barsoap@lemm.ee · edit-2 10 months ago

None of those concerns are new in principle: AI is the current thing that makes people worry about corporate and government BS but corporate and government BS isn’t new.

Then: The cat is out of the bag, you won’t be able to put it in again. If those things worry you the strategic move isn’t to hope that suddenly, out of pretty much nowhere, capitalism and authoritarianism will fall never to be seen again, but to a) try our best to get sensible regulations in place, the EU has done a good job IMO, and b) own the tech. As in: Develop and use tech and models that can be self-hosted, that enable people to have control over AI, instead of being beholden to what corporate or government actors deem we should be using. It’s FLOSS all over again.

Or, to be an edgelord to some of the artists out there: If you don’t want your creative process to end up being dependent on Adobe’s AI stuff then help training models that aren’t owned by big CGI. No tech knowledge necessary, this would be about providing a trained eye as well as data (i.e. pictures) that allow the model to understand what it did wrong, according to your eye.

BolexForSoup@kbin.social · edit-2 10 months ago

I said:

I don’t think these people should be locked up or all AI usage banned. But there is definitely a middle ground between absolute prohibition and no restrictions at all.

I have used AI tools as a shooter/editor for years so I don’t need a lecture on this, and I did not say any of the concerns are new. Obviously, the implication is AI greatly enables all of these actions to a degree we’ve never seen before. Just like cell phones didn’t invent distracted driving but made it exponentially worse and necessitated more specific direction/intervention.

CanadaPlus@lemmy.sdf.org · 10 months ago

Honestly that’s a good rule of thumb for all headlines at this point.

casmael@lemm.ee · 10 months ago

Good point good point

some_guy@lemmy.sdf.org · 10 months ago

They mentioned one potential use that I thought has value and that I hadn’t considered. For video conferencing, this could transmit data without sending video and greatly reduce the amount of bandwidth needed by rendering people’s faces locally. I don’t think that outweighs the massive harms this technology will unleash. But at least there was some use that would be legit and beneficial.

I’m someone who has a moral compass and I don’t like that scammers will abuse this shit so I hate it. But there’s no keeping it locked away. It’s here to stay. I hate the future / now.

flora_explora@beehaw.org · 10 months ago

Wouldn’t you then have to run the AI locally on a machine (which probably draws a lot of power and memory) or use it via cloud (which depends on bandwidth just like a video call). I don’t really see where this technology could actually be useful. Sure, if it is only a minor computation just like if you take a picture/video with any modern smartphone. But computing an entire face and voice seems much more complicated than that and not really feasible for the usual home device.

Markaos@lemmy.one · 10 months ago

Yeah, it’s not practical right now, but in 10 years? Who knows, we might finally have some built-in AI accelerator capable of running big neural networks on consumer CPUs by then (we do have AI accelerators in a large chunk of current CPUs, but they’re not up to the task yet). The system memory should also go up now that memory-hungry AI is inching closer to mainstream use.

Sure, Internet bandwidth will also increase, meaning this compression will be less important, but on the other hand, it’s not like we’ve stopped improving video codecs after h.264 because it was good enough - there are better codecs now even though we have the resources to handle bigger h.264 videos.

The technology doesn’t have to be useful right now - for example, neural networks capable of learning have been studied since the 1940s, even though there would be no way to run them for many decades, and it would take even longer to run them in a useful capacity. But now that we have the technology to do so, they enjoy rapid progress building on top of that original foundation.

flora_explora@beehaw.org · 10 months ago

Fair point, I agree.

barsoap@lemm.ee · edit-2 10 months ago

A model that can only generate frontal to profile views of heads would be quite small, I can totally see that kind of thing running on current consumer GPUs, in real time. Near real time is already possible with SDXL-based models with some speedup tricks applied as long as you have a mid-range gaming GPU and those models are significantly more general. It’s not like the model would need to generate spaghetti and sports cars alongside with the head.

Lem Jukes@lemm.ee · 10 months ago

Also I would argue sending the actual video of what is happening in front of the camera is kind of the entire point of having a video call. I don’t see any utility in having a simulated face to face interaction where neither of you is even looking at an actual image of the other person.

henfredemars@infosec.pub · 10 months ago

You can’t simply not develop a technology. Progress is going to move forward. If they don’t do it, somebody else is going to figure out how. The tools are out there. The math works. Better researchers to do it now and scare us into finding solutions than criminals to develop it first.

notfromhere@lemmy.ml · 10 months ago

Other than the obvious malicious uses of this technology, it could be great for multimedia, great for creative control for cast, great for virtual meetings to always look “your best” (as determined by each individual, e.g. clean-cut pristine, and/or preferred gender, and/or favorite anime, etc.). There are also use cases to hear letters spoken by a lost loved one, or replace the Three Stooges with politicians. Tons of “safe” use cases that I am looking forward to.

henfredemars@infosec.pub · 10 months ago

This is a really positive take. I would love to create such an AI of myself in my likeness so that if one day I come to pass before my wife, she could enjoy having that comfort. I imagine it speaking like: while I’m not your husband, here’s what I think he would’ve said.

Deep faking myself so I don’t have to use my camera in meetings? I would pay for that feature.

floofloof@lemmy.ca · edit-2 10 months ago

I’m not convinced any of these uses are actually beneficial. They mostly range from creepy to pointless.

notfromhere@lemmy.ml · 10 months ago

Entertainment might be pointless to some. I dream of having an on-demand Netflix that will generate whatever type of content I can imagine on demand, or better yet already know my preferences and all I have to do is tell it my mood and it will start playing something I would like.

floofloof@lemmy.ca · edit-2 10 months ago

A difference in goals, I guess. Having programs generated just to pander to my existing tastes sounds horrible to me. I want to be challenged and surprised and have my tastes tested and changed in unpredictable ways. I also want to watch stuff that’s written by humans and acted by humans, because there’s a sense of shared life there that there isn’t in an AI-generated video.

Pete Hahnloser@beehaw.org · edit-2 10 months ago

It’s also then just one step removed from refusing to accept any friends or romantic partners who don’t do exactly what you want at all times because life is supposed to be tailored to you.

notfromhere@lemmy.ml · edit-2 10 months ago

Actually I’m not sold on that logic. You could say that about anything at that point. The food that you order, the school you attend, your shoes.

meseek #2982@lemmy.ca · 10 months ago

Because bags of money. And MS is a hyper toxic entity that’s been siphoning the data of every Windows user for decades now. That company is basically IBM during WW2.

BraveSirZaphod@kbin.social · 10 months ago

If something is possible, and this simply indeed is, someone is going to develop it regardless of how we feel about it, so it’s important for non-malicious actors to make people aware of the potential negative impacts so we can start to develop ways to handle them before actively malicious actors start deploying it.

Critical businesses and governments need to know that identity verification via video and voice is much less trustworthy than it used to be, and so if you’re currently doing that, you need to mitigate these risks. There are tools, namely public-private key cryptography, that can be used to verify identity in a much tighter way, and we’re probably going to need to start implementing them in more places.

PM_ME_VINTAGE_30S [he/him]@lemmy.sdf.org · 10 months ago

Would be great for me and others who have trouble with body language. I could deepfake a version of myself with neurotypical body language and offload the effort of “acting normal” to the AI for interviews and video calls. Genuinely I’m super pumped for this.

BolexForSoup@kbin.social · 10 months ago

Now that is interesting, I’ve never heard this consideration before.

CanadaPlus@lemmy.sdf.org · 10 months ago

They’re also releasing a detector, for what it’s worth.

Yeah, this one seems like it will have more negative applications than positive. Usually you’ll have a lot more content from someone you want to copy for non-deceptive reasons. It’s inevitable all video will be easily fake-able one day soon, but why hasten it?

luciole (he/him)@beehaw.org · 10 months ago

The actual research page is so awkward. The TLDR at the top goes:

single portrait photo + speech audio = hyper-realistic talking face video

Then a little lower comes the big red warning:

We are exploring visual affective skill generation for virtual, interactive characters, NOT impersonating any person in the real world.

No siree! Big “not what it looks like” vibes.

perishthethought@lemm.ee · 10 months ago

Someone help me out please. Who was the 90s sci-fi author who predicted actors would go away and all movies would be made using cgi /ai? She had characters in the book, watching movies starring Humphrey Bogart and John Wayne, as detectives solving crimes (and so on). She also predicted “ractors”, people who act in front of a camera, so a computer can use their motion and expressions to animate a character on screen in real time.

My feeble brain, I swear… In any case, thanks to her, knew this day was coming. Gonna be a wild ride though.

notfromhere@lemmy.ml · edit-2 10 months ago

According to Le Chat,

The author you’re thinking of is Neal Stephenson, and the book is “Snow Crash” published in 1992. In the book, he coined the term “ractors” for actors who perform in front of motion-capture cameras to create lifelike animations. He also predicted the use of CGI and AI in filmmaking to create movies with long-dead actors.

I haven’t read it and the Wikipedia article doesn’t seem to mention virtual actors, so it could be wrong. At least it didn’t hallucinate a fake book.

Handles@leminal.space · 10 months ago

It just cited the wrong Neal Stephenson book, so not way off.

perishthethought@lemm.ee · 10 months ago

Great book, eh. So many interesting ideas presented in there.

14th_cylon@lemm.ee · 10 months ago

https://archive.is/ZTU90

The Atlantic | Neal Stephenson’s Most Stunning Prediction

The sci-fi legend coined the term metaverse. But he was most prescient about our AI age. By Matteo Wong

perishthethought@lemm.ee · 10 months ago

<cough> William Gibson <cough>

perishthethought@lemm.ee · 10 months ago

Oh snap, thanks - I was mixing up The Diamond Age with another book, yes. Ractors are from Stephenson, but I also had another author’s books in my head. See? Feeble mind. There’s still another woman author I need to track down and re-read here.

perishthethought@lemm.ee · edit-2 10 months ago

Aha! I found it –

Remake by Connie Willis

https://www.worldswithoutend.com/novel.asp?ID=86

Remake is a 1995 science fiction novel by Connie Willis. It was nominated for the Hugo Award for Best Novel in 1996. The book displays a dystopic near future, when computer animation and sampling have reduced the movie industry to software manipulation.

Great book, well worth reading still.

Gamma@beehaw.org · edit-2 10 months ago

I asked Perplexity with “What is the scifi book from the 90s that had “ractors,” where a person would act in front of a camera and a computer would animate a CG model?”and got (what other commenters are saying) is the correct answer:

The science fiction book from the 90s that featured “ractors,” where a person would act in front of a camera and a computer would animate a CG model, is not directly mentioned in the provided search results. However, based on the description of “ractors” and the context of computer animation and CG models, it seems you might be referring to “The Diamond Age” by Neal Stephenson, published in 1995. In this novel, the term “ractor” (short for “interactive actor”) is used to describe performers who participate in interactive theater through virtual reality environments, which could align with the concept of acting in front of a camera to animate a CG model. However, since this specific detail is not found in the search results, this answer is based on existing knowledge outside of the provided sources.

some_guy@lemmy.sdf.org · 10 months ago

The eyes still have uncanny valley vibes, but that’s because I’m looking for it. If I wasn’t watching demo videos about generated video, I might not have noticed.

davehtaylor@beehaw.org · edit-2 10 months ago

And that’s the problem. The average person isn’t looking for it, and will absolutely not see it. As long as it’s good enough, that’s all that matters. A plausible enough video of Joe Biden talking about rounding up Christians into internment camps that gets shared on Facebook, or something like that which panders to right-wing bigotry, is enough to get people going. Even real images and videos that are miscaptioned are enough, and even when a link is there that disproves the caption.

People seriously underestimate just how horrifying the possibilities are with this shit. And as high stakes as this election cycle is, and the state of politics in this country, the tendency for people to latch on to anything that affirms their preexisting ideals creates a fucking minefield

Pete Hahnloser@beehaw.org · 10 months ago

This is an education problem as much as – if not moreso than – a tech problem. Before the GOP gutted critical thinking wherever they held a majority and two generations were able to grow up under those circumstances, a video of any current president rounding up Christians would have been roundly rejected as either satirical or disinformation by the vast majority of the population, owing to the absurdity of the idea.

Once we got to the point of a not-insignificant minority of the population believing that the true power in the United States lies in the basement of a pizza shop with no basement …

thingsiplay@beehaw.org · 10 months ago

Trained on YouTube clips

It could have been worse. Imagine trained by Tik Tok clips.

P03 Locke@lemmy.dbzer0.com · 10 months ago

Sigh, not this article again. No, they can’t “deepfake a person with one photo”. They can create a bad uncanny-valley 75% accurate version of one.

thingsiplay@beehaw.org · 10 months ago

a bad uncanny-valley 75% accurate version of one

Actually a perfect description of what a deepfake is.

DdCno1@beehaw.org · 10 months ago

I’ve seen far more convincing deepfakes, to the point I couldn’t tell until I was told. I’ve experimented with this myself. After a bit of trial and error, almost anyone can easily create shockingly convincing deepfakes. One interesting method is using 3D rendered characters with deepfake faces.

flango@lemmy.eco.br · 10 months ago

Well, just watch " The masked scammer " documentary and you’ll see how this can ( and definitely will ) go wrong. For summary, there’s this article on Wikipedia: Gilbert Chikli.

esaru@beehaw.org · 10 months ago

I think this has an effect most people don’t think of: Media will just lose it’s value as a trusted source for information. We’ll just lose the ability of broadcasting media as anything could be faked. Humanity is back to “word of mouth”, I guess.

arglebargle@lemm.ee · 10 months ago

This milestone was reached a long time ago. For some reason uncle bobs Facebook post has been just as reliable a media source as any other for a lot of people already.

grrgyle@slrpnk.net · 10 months ago

Omg stop what are you guys thinking

thingsiplay@beehaw.org · 10 months ago

Money.

Phoenixz@lemmy.ca · 10 months ago

Yeah Microsoft isn’t releasing this until we can use it responsible.

we’ll never be able to guarantee that. There will always be people abusing this.
Though right now it’s in the hands of Microsoft and likely requires a shit tonne of hardware to run (I’d imagine a collection of specialized servers), this tech WILL come out eventually, and eventually, everyone will be able to run it.
I give it 5-10 years tops before anyone can just do this with anyone. Want to make a movie of trump or Hilary fucking a donkey? Done. Want to make a video of your 5 year old daughter in a gangbang? Done. The future is very bleak.

I’m honestly unsure if the internet was a good idea and I’m even less sure if humanity was a good idea.