Make note, folks who have been demanding only “ethical” AI training. You’re demanding a world in which only Getty Images and other such existing incumbent copyright-holding corporations have decent AIs.
Getty is benefitting from having historically paid creators for the rights to their creations. The horror.
VCs have burned oodles of cash on startups. They could do the same to fund artists and photographers to create training images. A company could earn the good will of the community by starting with public domain and CC images. People who support AI image generation could sign over their own photos.
There are options that aren’t as easy and carry more risk than unethically scraping the web. But companies are willing to be unethical until the law catches up, in hopes of cementing their foothold. See Uber and Airbnb for examples.
Getty Images is infamous for adding public domain images to their archives and then sending threatening demands for payment from anyone they subsequently spot using them. They’re a big giant corporation like any other, all they’re interested in is cash flow.
Note that I put “ethical” in quotes because ethics are a subjective matter that can’t be proven one way or another. “Scraping the web” is IMO no different from regular old reading the web, which is what it’s for. If you don’t want your images to be seen then don’t put them online in the first place.
If you don’t want your images to be seen then don’t put them online in the first place.
I don’t think anyone is objecting to the things they put online being seen. They’re objecting to companies creating derivatives for commercial purposes.
So the free open-source AIs are fine? I’ve seen plenty of objections to those as well.
I think there is a wording issue going on here, people object to their posts being used in ways they weren’t expecting, in this case people post things for others to see not for use in AI datasets,
whether the AI is open source or not doesn’t effect anything about the training data being used with or without permission
If explicit permission for specifically AI training is required then AI is basically impossible, because nobody gave that permission.
I don’t think such permission should be required, though, either legally or ethically. When you put something up for public viewing you don’t get to retroactively go “but not like that” when something you didn’t expect looks at it. The permission you gave inherently involves flexibility.
Make note about what? This is a good thing. They went through the efforts to acquire the rights for the training data, and people might have even been paid for the original work.
It’s not like this stops someone from paying artists or photographers to make art or photos for their training data, or creating some sort of group where contributors actively give the rights to their own artwork or photos for a model, like some sort of open source project kind of thing, people love that kind of stuff! You’re just acting like this is some awful thing, when it’s completely fine, and the way it should be.
To me the obvious answer would be to pay people a small amount per photo for pictures of various things and then use that as training data.
That’s expensive and companies would rather not pay while the law is unclear on using copywrited images in a training set
The thing is that for medium to large companies it’s probably less expensive to pay people a nominal fee for pictures than it would be to risk being sued by, say, Disney, Nintendo, WWE or Games Workshop (to use some famously litigious companies).
I hope that’s the direction we head that way artists are appropriately compensated for their work. We’ll see entire libraries/brokers pop up that grant LLM makers access to work for a fee.
Weird to think that Getty Images is making an AI that will turn their copyrighted images into non-copyrightable images. They’re usually pretty strict about enforcing their copyrights.