If a recording of someones very rare voice is representable by mp4 or whatever, could monkeys typing out code randomly exactly reproduce their exact timbre+tone+overall sound?

I don’t get how we can get rocks to think + exactly transcribe reality in the ways they do!

Edit: I don’t get how audio can be fossilized/reified into plaintext

  • Linsensuppe@feddit.org
    link
    fedilink
    English
    arrow-up
    20
    ·
    6 months ago

    Yes, monkeys could type out the zeros and ones. In fact we (not the monkeys) kind of did. There is a library of babel for audio named the sound library of babel which contains every 15 seconds audio recording you can imagine. Every single one. Almost all of them are white noise, but still there are recordings of every human saying any words in 15 seconds.

    • Nibodhika@lemmy.world
      link
      fedilink
      arrow-up
      4
      arrow-down
      1
      ·
      6 months ago

      I call bullshit on that. Every second there are 44100 samples of 8 bit, so every second of sound is 44100 bytes, or 44kB. Even 1 second of audio is impossible to generate all possibilities.

      To put this in perspective, there’s something called Universally Unique Identifier (UUID for short), one of them is 128 bits, or 16 bytes. Let’s imagine these were 1 bit long, on the second attempt at generating an id you would have a 50% chance of generating a repeated one, which means that by the third one you generate the chances that you have already generated a repeated id are 50%; If we extend this to 1 byte (i.e. 256 possibilities) the second time you have 1/256 chance of generating a repeated one, the second time 1/255, so on, and so forth. So from the third one on your chances of having already generated a duplicated id are 1/256 + 1/255 + 1/254 + … This means that by the 103th id you generate you have a 50% chance to have already generated a repeated one; why did I do those examples? Because a UUID has 16 bytes, this means that if you generated a billion UUID per second, it would take you 100 years to have a 50% chance of having generated a repeated one, and by that time you would need 43 ZB of storage (that’s not a typo, it’s Zettabytes as in 1024 EB (which is also not a typo, that’s Exabytes which is 1024 PB (which is also not a typo, that’s Petabytes which is 1024 TB, or Terabytes which is the first measure people are likely to be familiar with))).

      Let me again try to put this in perspective, if Google, Amazon, Microsoft and Facebook emptied all of their storage just for this, they wouldhave around 2 Exabytes, so you would need a company 4300x larger than that conclomerate to have enough space to store the amount of unique ids that would be generated from a 16 byte random data (until you have a 50% chance of generating a repeated one).

      Another way of thinking about this is that to store all of the possible combinations of 1 bit you need 2 bits of space, for 2 bits is 4, for 3 bits is 8, it goes on exponentially, so that for n bits is 2^n. For the UUID that is 3.4E38, or 3.5E13 YB (again, not a typo, that’s 1024 Zettabytes), i.e 35000000000000 YB (I could go up a few more orders of magnitude, but I think I made my point). And this is for 128 bits, every bit doubles that amount.

      So again, I call bullshit that they have all possible sounds for even 1 second which is almost 3x that amount.

      • maengooen@lemmy.world
        link
        fedilink
        arrow-up
        7
        ·
        6 months ago

        I appreciate the interest in doing all the math, and I am also not specifically familiar with audio or the audio library, but I believe you could use a similar argument against the OG library of babel, and I happen to know(confidently believe?) that they don’t actually have a stored copy of every individual text file “in the library”, rather each page is algorithmically generated and they have proven that the algorithm will generate every possible text.

        I’d wager it’s the same thing here, they have just written the code to generate a random audio file from a unique input, and proven that for all possible audio files (within some defined constraints, like exactly 15 seconds long), there exists an input to the algorithm which will produce said audio file.

        Determining whether or not an algorithm with infrastructure backing it counts as a library is an exercise left to the reader, I suppose.

        • Nibodhika@lemmy.world
          link
          fedilink
          arrow-up
          2
          arrow-down
          2
          ·
          6 months ago

          The claim was it “contains every 15 seconds audio recording you can imagine. Every single one.”. Which is bullshit, that’s like saying this program contains every single literally work:

          import sys
          
          print(sys.argv[1])
          

          It’s just adding a layer of encoding on top so it feels less bullshity, something like:

          def decode(number: int):
            out = ""
            while number:
              number, letter_index = divmod(number, len(string.printable))
              out += string.printable[letter_index]
            return out
          

          That also does not contain every possible (ASCII) book, it can decode any number into a text, and some numbers happen to contain texts that are readable.