With the OSI publishing their abysmal - explicitly not open source - “Open Source AI” definition I thought I’d post my argument, why it is bad and why “Open Source AI” currently probably does not exist.

  • bitofhope@awful.systems
    link
    fedilink
    English
    arrow-up
    15
    ·
    1 day ago

    The stretching is just so blatant. People who train neural networks do not write a bunch of tokens and weights. They take a corpus of training data and run a training program to generate the weights. That’s why it is the training program and the corpus that should be considered the source form of the program. If either of these can’t be made available in a way that allows redistribution of verbatim and modified versions, it can’t be open source. Even if I have a powerful server farm and a list of data sources for Llama 3, I can’t replicate the model myself without committing copyright infringement (neither could Facebook for that matter, and that’s not an entirely separate issue).

    There are large collections of freely licensed and public domain media that could theoretically be used to train a model, but that model surely wouldn’t be as big as the proprietary ones. In some sense truly open source AI does exist and has for a long time, but that’s not the exciting thing OSI is lusting after, is it?

    • BlueMonday1984@awful.systems
      link
      fedilink
      English
      arrow-up
      9
      ·
      1 day ago

      I’ve already talked about the indirect damage AI’s causing to open source in this thread, but this hyper-stretched definition’s probably doing some direct damage as well.

      Considering that this “Open Source AI” definition is (almost certainly by design) going to openwash the shit out of blatant large-scale theft, I expect it’ll heavily tar the public image of open-source, especially when these “Open Source AIs” start getting sued for copyright infringement.

  • BlueMonday1984@awful.systems
    link
    fedilink
    English
    arrow-up
    10
    ·
    1 day ago

    A pretty solid piece on how AI is closed-source by nature, and a solid takedown on the OSI’s FOMO-fuelled dumpster fire of an Open Source AI definition.

    I’ve also thought a bit about AI’s relationship with open-source as well - to expand my views a bit, I view AI as having a hostile relationship with open source, stealing whatever it wants and damaging open-source projects when it quote-unquote “gives back”, and I suspect that we will see a severe decline in the FOSS ecosystem because of it.

    With AI bros treating “publicly available” to mean “theirs to steal” (sometimes openly saying it, oftentimes suggesting it with their actions) and more-or-less getting away with it for the past two years, people have been given plenty of reason to view FOSS licenses (Creative Commons, GPL, etcetera) as not worth the .txt files they’re written in, and contributing to it as asking to have their code stolen.

    The recently-released Stallman Report (which you mentioned) definitely isn’t helping FOSS either - all the diversity initiatives and codes of conduct in the world can’t protect against a PR nightmare on the magnitude of “your movement’s unofficial face becomes the Jeffery Epstein of coding”.

    Baldur Bjarnason’s also talked about open-source’s rocky financial future - I’d recommend checking it out.

  • jaschop@awful.systems
    link
    fedilink
    English
    arrow-up
    7
    ·
    1 day ago

    Hi @tante@awful.systems 👋

    Nice writeup. Been reading some of your stuff for a few years. Seems like the gravitational attraction of awful.systems got us both.