• 0 Posts
  • 49 Comments
Joined 2 years ago
cake
Cake day: June 12th, 2023

help-circle
  • Not entirely true. You don’t need your own personal data centre, you can use GPU cloud instances for a lot of that stuff. It’s expensive but not so expensive that it would be impossible without being a huge tech company (only 1000s of dollars, not billions). This can be done by anyone with a credit card and some cash to burn. Also, you don’t need to train a model from scratch, you can build on existing models that others have published to cut down on training.

    However, to impersonate someone’s voice you don’t need any of that. You only need about 5-10 seconds of audio for a zero-shot impersonation with a pre-trained model. A minute or so for few-shot. This runs on consumer hardware and in some cases even in real time.

    Even to build your own model from scratch for high quality voice audio, there doesn’t need to be a huge amount of initial training data. Something like xtts was trained with about 10-15K hours of English audio which is actually pretty easy to come by in the public domain. There are a lot of open and public research datasets specifically for this kind of thing, no copyright infringements necessary. If a big tech company wants more audio data than what’s publically available, they just pay people to record audio, no need to steal it or risk copyright claims and breaking surveillance laws, they have a budget to exploit people to record whatever they want.

    This tech wasn’t invented by some evil giant tech company stealing everybody’s data, it was mostly geeky computer scientists presenting things at computer speech synthesis conferences. That’s not to say there aren’t a bunch of huge evil tech companies profiting from this or contributing to this kind of tech, but in the context of audio deepfakes being accessible to scammers, it’s not on them and I don’t think that some kind of extra copyright regulation on data centres would do anything about it.

    The current industry leader in this space in terms of companies trying to monetize speech synthesis is elevenlabs which is a private start-up with only a few dozen employees.

    The current tech is not perfect but definitely good enough to fool someone who isn’t thinking too hard over a noisy phone call and a scammer doesn’t need server time or access to a data centre to do it.


  • One thing you gotta remember when dealing with that kind of situation is that Claude and Chat etc. are often misaligned with what your goals are.

    They aren’t really chat bots, they’re just pretending to be. LLMs are fundamentally completion engines. So it’s not really a chat with an ai that can help solve your problem, instead, the LLM is given the equivalent of “here is a chat log between a helpful ai assistant and a user. What do you think the assistant would say next?”

    That means that context is everything and if you tell the ai that it’s wrong, it might correct itself the first couple of times but, after a few mistakes, the most likely response will be another wrong answer that needs another correction. Not because the ai doesn’t know the correct answer or how to write good code, but because it’s completing a chat log between a user and a foolish ai that makes mistakes.

    It’s easy to get into a degenerate state where the code gets progressively dumber as the conversation goes on. The best solution is to rewrite the assistant’s answers directly but chat doesn’t let you do that for safety reasons. It’s too easy to jailbreak if you can control the full context.

    The next best thing is to kill the context and ask about the same thing again in a fresh one. When the ai gets it right, praise it and tell it that it’s an excellent professional programmer that is doing a great job. It’ll then be more likely to give correct answers because now it’s completing a conversation with a pro.

    There’s a kind of weird art to prompt engineering because open ai and the like have sunk billions of dollars into trying to make them act as much like a “helpful ai assistant” as they can. So sometimes you have to sorta lean into that to get the best results.

    It’s really easy to get tricked into treating like a normal conversation with a person when it’s actually really… not normal.







  • Not American, or really knowledgeable about it but from the outside, I think this looks like ordinary politicking.

    IVF is a proxy war for abortion. Dems want the talking point that abortion bans hurt/block IVF. Republicans/Trump want to remove that talking point by saying they love IVF “we want more babies right?” and will support laws to protect it as a separate and unrelated issue to abortion.

    Dems put forward a bill that not only protects it but makes insurance companies pay for it. Trump is fine with that because it benefits him but Republicans in Congress get big money from insurance lobbyists and so they can’t vote for it. They also have fears that they’ll piss off their homophobic supporters by making them pay for something the gays might use (insurance costs will go up to help someone who isn’t me!").

    Republicans put forward another bill that protects IVF without hurting their insurance company buddies but the Dems block it. Republicans then have to vote against the IVF bill and the Dems can now say “see! They really don’t care about reproductive rights at all!”

    Feels a bit like nobody involved actually cares about IVF at all and just wants votes and lobbyist money.

    In case this take comes across too centrist: Republicans and Trump are really quite shit.






  • Yeah, that’s fair enough, though I’m not sure it’s very different from malicious instances creating normal user accounts?

    You can see when users from an instance are all suspiciously voting the same way at the same time regardless of whether they are usernames or IDs.

    There’s lots of legitimate users that only vote but never post so doing it based on that doesn’t seem very effective?

    The second problem is solved using public key cryptography, the same way that you can’t impersonate someone else’s username to post comments. Votes and comments are digitally signed (There would need to be a different public key for voting to maintain pseudonymity though).


  • How about pseudonymous as a compromise? Votes could be publicly federated but tied to some uuid instead of the username. That way you still have the same anti spam ability (can see that a user upvoted these things from this instance at this time) but can’t tie it directly to comments or actual user accounts without some extra osint.

    It might be theoretically possible to correlate the uuids with an account’s activity and dox the user in some cases, especially with some instances having a single user, but it would be very difficult or impossible to do on larger instances and would add an extra layer. Single user instances would be kind of impossible to make totally private anyway because they can be identified by instance.





  • It’s not that it’s on the 172.16.0.0/12 range. That’s totally normal and used for all kinds of stuff.

    It’s that it’s in 172.16.42.0/24 which is the default dhcp settings for a wifi pineapple. It’s the /24 mask given on the .42 that’s a little suspicious because that’s not a common range for anything else.

    Being assigned one of those specific 253 hosts with that subnet mask would definitely make me think twice.