AI language models can exceed PNG and FLAC in lossless compression, says study

CoderSupreme@programming.dev · 1 year ago

AI language models can exceed PNG and FLAC in lossless compression, says study

ferret@sh.itjust.works · 1 year ago

How can they be lossless? Isn’t a neural network inherently lossy?

Bogasse@lemmy.ml · edit-2 1 year ago

I suppose the compression process looks like this :

call the model to predict the most probable next tokens (this is deterministic)
encode next tokens by with its ranking in model prediction

If the model is good at predicting what the next token is, I suppose you need only 2bits to encode each token (for any of the top 4 predictions).

bionicjoey@lemmy.ca · 1 year ago

If you're encoding the rankings as raw bits, how do you know when one ranking ends and the next begins? Zip compression solves this by using a BST, where you'd know if you need to keep reading by whether or not you've reached a leaf. But if there's no reference data structure to tell you this, how do you know if you should read 4 bits ahead or 5?

Sethayy@sh.itjust.works · 1 year ago

Depends on how you use it, if you just use it in place of finding repetition, it just means that our current way ain't the mathematically best and AI can find better lol.

If you tried to "compress" a book into chatgpt tho yeah it'd probably be pretty lossy