Codeberg was asking about this. The linked toot by a commenter points to :

SEqlite

These are CC-BY-SA 4.0 remixes of the Stack Exchange Creative Commons Data Dumps. 100% Unendorsed by Stack Exchange, Inc.

They are minimal. They provide the data you probably care about and the data you need to comply with the original license in SQLite format.

  • delirious_owl
    link
    fedilink
    arrow-up
    5
    ·
    8 months ago

    Why does that matter? The content is licensed CC BY-SA. The point here is to prevent AI answers.

    • DaseinPickle@leminal.space
      link
      fedilink
      arrow-up
      2
      ·
      8 months ago

      It seems to matter for the users at Stack Overflow. And why should anybody give anything for free to the crooks in Silicon Valley. All they do is create technology designed to extract value out of people and give as little as possible back.

      • delirious_owl
        link
        fedilink
        arrow-up
        1
        ·
        8 months ago

        Because that’s the nature of FOSS. The good news is, if they trained on you data that’s licensed CC BY-SA (as all SO content is), then you can request their source code, and they legally must provide it.

        This is a good thing.