Semver violations are common, better tooling is the answer

snaggen@programming.dev · 1 year ago

Semver violations are common, better tooling is the answer

BB_C@programming.dev · 1 year ago

Good work.

“Just don’t write bugs” ( or “Just don’t write semver violations” in this case) is now rightfully recognized as a joke proposition by many (although derivative ability/experience arguments are sometimes still used, UNIRONICALLY). But it’s the “better education” (or its sister magic pill “better docs”) that still has many believers. So it is still valuable to explicitly make the argument for reliable automated tooling as the only real logical solution. But I digress.

if our Example enum above was #[doc(hidden)], adding a new variant would not have violated semver.

Violations in #[doc(hidden)] items should definitely trigger errors by default IMHO. To give what was a kludge in the first place more powers is not something I would call wise. Not to mention, module source code is just one click away from html docs, and it’s also one click/key-combination away from a crate user’s editor/IDE with LSP (rust-analyzer).

So

how #[doc(hidden)] items sometimes have semver obligations after all,

I would argue it’s always the case, unless the user of the tool explicitly decides it’s not.

notriddle@programming.dev · 1 year ago

What do you think #[doc(hidden)] is for, other than declaring something “private” that the language unfortunately doesn’t let you declare as truly private right now?

I’ve mostly seen it used as a way to expose tools to macro APIs. For example, these internal parts of the quote! macro, or these internal parts of the vec! macro. Changing these things shouldn’t be considered a semver violation, because they’re not really part of the API, even though the quote! macro can’t enforce it.

The only other cases I can think of where I’ve seen #[doc(hidden)] used are even bigger kludges, and the hidden items definitely aren’t part of the platonic API, like pre-#[non_exhaustive] crates that wanted to reserve the right to add new variants to their enums.

SorteKanin@feddit.dk · 1 year ago

It would be nice if we had something like pub(macro), which would make something public but only inside macros from the same crate. So they are usable in macros but not part of the public API.

BB_C@programming.dev · edit-2 1 year ago

I’m arguing (humbly of course) that intended vs. unintended use of what is at the end of the day a part of the public interface shouldn’t be taken into consideration by default. Otherwise, other cases can be argued as non-breaking too!

Foo was never meant to be sent to other threads, So, losing Send is not a semver- breaking change!

Exhaustive enum Bar is only meant to be matched exhaustively internally. We say so in the docs. So adding a variant to it is not a semver-breaking change!

And giving more powers to a (kludge) doc attribute just doesn’t seem in my eyes to be a generally wise move.

A: cargo-semver is still complaining about this item which I already have cfg-ed under an exp_api crate feature (which I don’t want to rename). CI is failing.

B: PRO-TIP: Just slap a #[doc(hidden)] on it and CI will pass!

A: What if I still want to see the docs?

B: We are pushing for --document-hidden-items to stabilize soon. So you can just simply use that!

notriddle@programming.dev · 1 year ago

That’s a good point.

cargo-semver-check should definitely provide a way to mark syntactically-public items as “de-facto private,” because some projects just need to do bad things and leaving them out in the cold is not helpful. But you’ve convinced me that doc(hidden) is a poor way to do it.

Anders429@lemmy.world · 1 year ago

Probably a spicy take, but I think any API being used by a macro should be made public. A macro shouldn’t be the only way to do something; it should just be a way to remove the boilerplate required to do it.

BitSound@lemmy.world · 1 year ago

semver is a nice way to communicate your intent to other humans, but it’s never been a great way of programmatically communicating those changes. If you want it to communicate something meaningful programmatically, you’d have to have the version generated automatically, and not rely on a human to do so.

SorteKanin@feddit.dk · 1 year ago

Semver was literally made to communicate programmatically. But people keep using it for “human” communication instead, like the whole “1.0 means stable” thing.

BitSound@lemmy.world · 1 year ago

It was made to do so, but failed from the start exactly because humans got involved. semver’s ideals can only happen when tooling generates the version number, not humans.

sugar_in_your_tea@sh.itjust.works · 1 year ago

How can a machine decide if something is a patch, minor, or major release? I guess a major release could be defined by a comprehensive unit test suite breaking, but the others are very much something humans should decide.

Imo, the solution is that people need to be less afraid of major release bumps. Do it frequently and it’s not likely to cause issues downstream.

SorteKanin@feddit.dk · 1 year ago

It is quite simple conceptually to decide. Simply examine all the functions of the API between two versions. If the signatures are equivalent, increment the patch version. If there are new signatures, but the existing ones are the same, it’s a minor version. If any function signatures change, it’s a major version.

Then you also need to examine trait impls and such but the basic idea is the same.

sugar_in_your_tea@sh.itjust.works · 1 year ago

That oversimplifies and misses a lot of edge cases, such as:

change the meaning of an existing parameter - e.g. an Integer timeout changes from seconds to milliseconds
dependency changes, and this package exposes exposes types from that dependency
internal refactor changes the order of execution of existing uses (say, a scheduler change in an async library, or event order in a GUI library)

Each of those would be, imo, a breaking change, but an automated semver tool would probably mark them as patch releases. I could come up with more examples, but hopefully the point is clear.

Maybe it’s correct 90% of the time, but the last 10% of the time can be really impactful. I think there should be less sigma against major releases. If in doubt, mark it as a major release.

SorteKanin@feddit.dk · 1 year ago

change the meaning of an existing parameter - e.g. an Integer timeout changes from seconds to milliseconds

Ideally you should change the type if you do such a thing, which would cause it to become a breaking change. In this specific instance, you should take std::time::Duration obviously.

dependency changes, and this package exposes exposes types from that dependency

I think that should be automatically detectable?

But yes you’re right, in general it’s not possible to detect all forms of semver changes. But perhaps at least detecting violations when they weren’t meant to be there would be good.

And yea 100% agree with you, people should use the power that is 2.0.0 way more than they currently do.

sugar_in_your_tea@sh.itjust.works · 1 year ago

Ideally you should change the type if you do such a thing

But that’s not always possible or desirable. For example, maybe you’re largely just passing it through to another library, and that library made a breaking change.

So I don’t think we should have semver always handled automatically, it should instead prompt the developer with its best guess, and the developer would ideally only move from patch -> minor, patch -> major, or minor -> major, and never the other direction. But the developer should always be involved in picking the version. So don’t just throw it into CI and call it a day, but instead have a CLI tool that suggests it and requires developer approval before making the PR for the version bump.

webbureaucrat@floss.social · 1 year ago

@sugar_in_your_tea If you’re interested, I recommend looking at how Elm does it. Elm has automatic semver enforcement in its package system.

The long and the short is
missing stuff: major change
new stuff: minor change
patch: internal implementation change.

sugar_in_your_tea@sh.itjust.works · 1 year ago

I gave a more thorough response with examples to the other user that replied (link on my instance here, but basically there are cases where you could break someone’s code with a patch release.

I’m completely fine with using tools to help decide what versions to assign, but in general developers should not hesitate to increment the major version if there’s any doubt.

asdfasdfasdf@lemmy.world · 1 year ago

Is that even possible though? Sometimes you need a human to understand if something is a breaking change.

Imagine an API like fn third_planet_from_sun() -> String, and an update is made where the output changes the value to be lowercase instead of capitalized. That should normally be considered a breaking change.

However, imagine fn current_version() -> String. That is by its definition meant to change outputs between versions, so it isn't a breaking change since that's part of its human, documentation based API contract.

Also, what if somw function which returns a String changes, but only one code path that is very hard to hit changes the output? How would a machine find that?

I guess the first example with Earth / version could use some attribute macro so devs can say the output is expected to change across versions, but then there is no way for a program to know what is a breaking change vs expected vs a bug.

BitSound@lemmy.world · 1 year ago

To do it 100% probably isn't possible, something something halting problem. However, you'd catch a lot of basic mistakes with proper typing. In your example, the first function should be typed like this: fn third_planet_from_sun() -> Planet, where Planet is an enum. De/serializing it still has the same problem of interpreting an arbitrary string, but at least for deserializing it, you can be loose in what you allow and just lowercase it before matching it to the enum.

Alonely0 🦀@mastodon.social · 1 year ago

@snaggen the actual answer is having opt-out semver checks on cargo publish just like with git.

Anders429@lemmy.world · 1 year ago

There are also plenty of purposeful semver violations. For example, serde makes basically no attempt to follow semver, and any pleas to do otherwise fall on deaf ears.

kornel@programming.dev · 1 year ago

With the justification being “I can’t be bothered to decide what is breaking/feature/patch”, so hey, here’s a tool to tell you.

manpacket@lemmyrs.org · 1 year ago

Not quite. Suppose instead of a single version of serde there’s now 46 versions like in https://crates.io/crates/parquet to be able to use instances derived in some other crate X you have to use the same version of serde. Now, how should a crate decide which versions of serde to support?

All 46 and all optional? Supporting that would be painful. Just the last one? crates.io is a cemetery full of dead crates, with this support strategy any handful of crates picked at random are not going to be serde-compatible with each other.

A better solution would be a better support for compile time reflection so serde doesn’t have to exist in its current state, but that’s got delayed (by big macro conspiracy :)

Anders429@lemmy.world · 1 year ago

What’s more, there was a recent discussion about why the derive feature is recommended in serde, and one of the points brought up was that the versions for both crates basically have to be equal. I couldn’t help but wonder, would this be a problem if the releases actually followed semver? Theoretically, it shouldn’t matter what versions you use, as long as they’re above a certain minor version and the major versions match. But since everything is a patch, we have to pin the two crates together somehow.