update 2: The Linux community has suggested that I use a tar file to backup, as this preserves symlinks. With that, the home directory now takes up just 290 ish GiB, as it should. Now I will be distro hopping, wish me luck!

update: I was able to copy it! There are still some folders that are really big (as many have said, it is probably because symlinks aren’t supported in exFAT. When I transfer these files over to btrfs, will the symlinks come back or are they permanently gone?) but, with the uninstallation of Steam and Kdenlive (each taking a ridiculous amount of storage), removing a couple games I don’t really play, and removing old folders that lingered around from already uninstalled programs means I now have enough space to fit my home folder in the SSD (like 23 GiB left, so the lack of symlinks still hurts, but still, it fits!)

When running

rsync -Paz /home/sbird "/run/media/sbird/My Passport/sbird"

As told by someone, I run into a ran out of storage error midway. Why is this? My disk usage is about 385 GiB for my home folder, and there is around 780 GiB of space in the external SSD (which already has stuff like photos and documents). Does rsync make doubly copies of it or something? That would be kind of silly. Or is it some other issue?

Note that the SSD is from a reputable brand (Western Digital) so it is unlikely that it is reporting a fake amount of storage.

EDIT: Wait, is it because my laptop SSD is BTRFS and the external SSD is exFAT? Could that be the issue? That would be kind of weird, why would files become so much more bigger with the external SSD?

Thanks everyone for your help to troubleshoot! It was super helpful! Now I need to go to bed, since I’ve been up so late it’s already tomorrow!

  • drkt@scribe.disroot.org
    link
    fedilink
    arrow-up
    12
    ·
    5 days ago

    rsync does not delete files at the target by default, it has kept all of the original files when they were deleted from the original source location.

    You must specify --delete for it to also delete files at the target location when they are deleted at the source.

    If you want to be extra safe, you can use --delete-first to run the deletion process before moving files, ensuring that you always have space at the target.

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      5 days ago

      The directory “sbird” in the SSD did not exist beforehand though?

      • drkt@scribe.disroot.org
        link
        fedilink
        arrow-up
        5
        ·
        5 days ago

        Are you saying this is your first run?

        Run ‘ncdu /run/media/sbird’ to find out why there’s no space on it.

  • [object Object]@lemmy.world
    link
    fedilink
    arrow-up
    6
    ·
    5 days ago

    The simplest explanation for the size difference could be if you have a symlink in your home folder pointing outside it. Idk if rsync traverses symlinks and filesystems by default, i.e. goes into linked folders instead of just copying the link, but you might want to check that. Note also that exFAT doesn’t support symlinks, dunno what rsync does in that case.

    It would be useful to run ls -R >file.txt in both the source and target directories and diff the files to see if the directory structure changed. (The -l option would report many changes, since exFAT doesn’t support Unix permissions either.) Apps like Double Commander can diff the directories visually (be sure to uncheck ‘compare by content’).

    As others mentioned, if you have hardlinks in the source, they could be copied multiple times to the target, particularly since exFAT, again, doesn’t have hardlinks. But the primary source of hardlinks in normal usage would probably be git, which employs them to compact its structures, and I doubt it that you have >300 GB of git repositories.

    • Wildmimic@anarchist.nexus
      link
      fedilink
      English
      arrow-up
      3
      ·
      5 days ago

      A second possibility is the deduplication feature of BTRFS. If he made copies of files on his SSD, they only take up space there when changing something - thats how i keep 5 differently modded Cyberpunk 2077 installations on my drive while only taking up a fraction of space that would be needed - I wouldn’t be able to copy this drive 1:1 onto a different filesystem.

      • [object Object]@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        5 days ago

        Ah, I knew the mention of btrfs heebied my jeebies a little, but forgot about the CoW thing.

        I’m guessing some btrfs-specific utils are necessary to figure out how much it cow’ed.

    • bleistift2@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      3
      ·
      5 days ago

      Idk if rsync traverses symlinks and filesystems by default,

      From the man page:

      Beginning with rsync 3.0.0, rsync always sends these implied directories as real directories in the file list, even if a path element is really a symlink on the sending side. This prevents some really unexpected behaviors when copying the full path of a file that you didn’t realize had a symlink in its path.

      That means, if you’re transferring the file ~/foo/bar/file.txt, where ~/foo/bar/ is a symlink to ~/foo/baz, the baz directory will essentially be duplicated and end up as the real directory /SSD/foo/bar and /SSD/foo/baz.

  • Riskable@programming.dev
    link
    fedilink
    English
    arrow-up
    5
    ·
    5 days ago

    Simple: Exfat does not support symbolic links. So every file that’s just a symbolic link on your btrfs filesystem is getting copied in full (the link is being resolved) to your Exfat drive.

    Solution: Don’t use Exfat. For backups from btrfs, I recommend using btrfs with compression enabled.

    Also don’t forget to rebalance your btrfs partitions regularly to reclaim lost space! Also, delete old snapshots!

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      5 days ago

      That makes a lot of sense. I can’t reformat the external SSD though, since it has a bunch of other files and needs to be used by my family (who are mostly Windows users)

  • confusedpuppy@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    4
    ·
    5 days ago

    There might be a possibility that recursion is happening and a directory is looping into itself and filling up your storage.

    I have some suggestions for your command to help make a more consistent experience with rsync.

    1: --dry-run (-n) is great for troubleshooting issues. It performs a fake transfer so you can sort issues before moving any data. Remove this option when you are confident about making changes.

    2: --verbose --human-readable (-vh) will give you visual feedback so you can see what is happening. Combine this with --dry-run so you get a full picture of what rsync will attempt to do before any changes are made.

    3: --compress (-z) might not be suitable for this specific job, as I understand, it’s meant to compress data during a file transfer intended over a network. In your commands current state, it’s just adding extra processing power which might not be useful for a connected device.

    4: If you are transferring directories/folders, I found more consistent behaviour from rsync by adding a trailing slash at the end of a path. For example use “/home/username/folder_name/” and not “/home/username/folder_name”. I’ve run into recursion issues by not using a trailing slash.

    Don’t use a trailing slash if you are transferring a single file. That distinction helps me to understand what I’m transferring too.

    5: --delete will make sure your source folder and destination folder are a 1:1 match. Any files deleted in the source folder will be deleted in the destination folder. If you want to keep any and all added files in your destination folder, this option can be ignored.

    --archive (-a) and --partial --progress (-P) are both good and don’t need to be changed or removed.

    If you do happen to be running into a recursion issue that’s filling up your storage, you may need to look into using the --exclude option to exclude the problem folder.

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 days ago

      How do I find which folder is causing problems? When using --verbose and --dry-run, it goes way too fast and the terminal can’t see all of the history

      • bleistift2@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        4
        ·
        5 days ago

        You can store the output of rsync in a file by using rsync ALL_THE_OPTIONS_YOU_USED > rsync-output.txt. This creates a file called rsync-output.txt in your current directory which you can inspect later.

        This, however means that you won’t see the output right away. You can also use rsync ALL_THE_OPTIONS_YOU_USED | tee rsync-output.txt, which will both create the file and display the output on your terminal while it is being produced.

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          5 days ago

          Having a quick scroll of the output file (neat tip with the > to get a text file, thanks!) nothing immediately jumps out to me. There isn’t any repeated folders or anything like that from a glance. Anything I should look out for?

          • bleistift2@sopuli.xyz
            link
            fedilink
            English
            arrow-up
            2
            ·
            5 days ago

            You checked 385GiB of files by hand? Is that size made up by a few humongously large files?

            I suggest using uniq to check if you have duplicate files in there. (uniq’s input must be sorted first). If you still have the output file from the previous step, and it’s called rsync-output.txt, do sort rsync-output.txt | uniq -dc. This will print the duplicates and the number of their occurrences.

            • sbird@sopuli.xyzOP
              link
              fedilink
              English
              arrow-up
              1
              ·
              5 days ago

              when using uniq nothing is printed (I’m assuming that means no duplicates?)

              • bleistift2@sopuli.xyz
                link
                fedilink
                English
                arrow-up
                2
                ·
                5 days ago

                I’m sorry. I was stupid. If you had duplicates due to a file system loop or symlinks, they would all be under different names. So you wouldn’t be able to find them with this method.

                • sbird@sopuli.xyzOP
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  5 days ago

                  running du command with --count-links as suggested by another user returns 384G (so that isn’t the problem it seems)

          • confusedpuppy@lemmy.dbzer0.com
            link
            fedilink
            arrow-up
            2
            ·
            5 days ago

            If you don’t spot any recusion issues, I’d suggest looking for other issues and not spend too much time here. At least now you have some troubleshooting knowledge going forward. Best of luck figuring out the issue.

      • confusedpuppy@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        1
        ·
        5 days ago

        Does your terminal have a scroll back limit? You may need to change that setting if there is a limit.

        That will depend on which terminal you are using and it may have a different name so I can’t really help more with this specific issue. You’ll have to search that up based on the terminal you are using.

  • olosta@lemmy.world
    link
    fedilink
    arrow-up
    4
    ·
    5 days ago

    Maybe you have hard links or sparse files in your source directory. Try with -H for hard links first. You can try with --sparse but I think hard links are more likely.

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 days ago

      Using -H throws an error as symlinks aren’t supported in exFAT it seems.

      • [object Object]@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        5 days ago

        By the way, do you have lots of torrents downloaded or large virtual machines installed? Both torrent clients and virtual machine managers use ‘sparse files’ to save space until you actually download the whole torrent or write a lot to the VM’s disk. Those files would be copied at full un-sparse size to exFAT.

        If you have folders with such content, you can use e.g. Double Commander to check the actual used size of those folders (with ctrl-L in Doublecmd). Idk which terminal utils might give you those numbers in place, but aforementioned ncdu can calculate them and present as a tree.

        Edit: silly me, of course du is the util to use, typically as du -hsc dirname.

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 days ago

          using du -hsc returns 384G with /home/sbird, and 150G inside the external SSD (when it does not have any of the files transferred with rsync)

          • [object Object]@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            edit-2
            5 days ago

            Well, that’s not what I meant. If you have directories with torrents or VMs, du might report different size for those directories on the source and target disks. Then that might mean that those are the culprits, depending on how much difference there is.

            With just the source disk, you can check du -hsc dirname versus du -hsc --apparent-size dirname to check if the disk space used is much smaller than the ‘apparent size’, which would mean there are sparse files in the directory, i.e. not fully written to disk. rsync would copy those files to full ‘apparent size’.

            As mentioned elsewhere, btrfs might also save space on the source disk by not writing duplicate files multiple times — but idk if du would report that, since this feature is specific to btrfs internals.

    • [object Object]@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      5 days ago

      For a typical user, hard links would be mostly employed by git for its internal structures, and it’s difficult to accumulate over 300 GB of git repos.

      Sparse files would actually be more believable, since they’re used by both torrent clients and virtual machines.

  • degenerate_neutron_matter@fedia.io
    link
    fedilink
    arrow-up
    3
    ·
    5 days ago

    BTRFS supports compression and deduplication, so the actual disk space used might be less than the total size of your home directory. I’d run du -sh --apparent-size /home/sbird to check how large your home dir actually is. If it’s larger than 780 GiB, there’s your problem. Otherwise there might be hardlinks which rsync is copying multiple times; add the -H flag to copy hardlinks as hardlinks.

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      5 days ago

      382G for /home/sbird (definitely not more than 780G) so that is strange. Using -H doesn’t work since the external SSD is exFAT (which from a quick search doesn’t support symlinks)

      • degenerate_neutron_matter@fedia.io
        link
        fedilink
        arrow-up
        2
        ·
        5 days ago

        You can rerun the du command with --count-links to count hardlinks multiple times. If that shows >780GiB you have a lot of hardlinks somewhere, which you can narrow down by rerunning the command on each of the subdirectories in your home directory.

        Your options would be to delete the hardlinks to decrease your total file size, exclude them from the rsync with --exclude, or repartition your SSD to a filesystem that supports hardlinks.

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 days ago

          With --count-links, it is just 384G so that is probably not the issue?

          • degenerate_neutron_matter@fedia.io
            link
            fedilink
            arrow-up
            1
            ·
            5 days ago

            That’s odd, maybe it has to do with symlinks? Adding --dereference to the du command will count the file size of the files referenced by symlinks. If that doesn’t show anything abnormal, I’d compare the directory sizes between your home directory and the rsync backup and try to find where they differ significantly. If it does show a much larger size, narrow down the location of the relevant symlinks (may be a hidden directory) and either delete them or exclude them from the rsync.

    • [object Object]@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      5 days ago

      Does du know about btrfs’ copy-on-write? I assumed some btrfs-specific utils would be necessary to see that.

      Edit: this page suggests that du is unaffected by btrfs, as in shows the full uncompressed and non-deduplicated size.

  • bleistift2@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    3
    ·
    5 days ago

    Could it be you have lots of tiny files and/or a rather large-ish block size on your SSD?

    You can check the block size with sudo blockdev --getbsz /dev/$THE_DEVICE.

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      5 days ago

      using the command returns 512 for the external SSD and 4096 for the SSD in my laptop. What does that mean?

      • bleistift2@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 days ago

        What does that mean?

        Imagine your hard drive like a giant cupboard of drawers. Each drawer can only have one label, so you must only ever store one “thing” in one drawer, otherwise you wouldn’t be able to label the thing accurately and end up not knowing what went where.

        If you have giant drawers (a large block size), but only tiny things (small files) to store, you end up wasting a lot of space in the drawer. It could fit a desktop computer, but you’re only putting in a phone. This problem is called “internal fragmentation” and causes files to take up way more space than it would seem they need.

        –––––

        However, in your case, the target block size is actually smaller, so this is not the issue you’re facing.

      • [object Object]@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        5 days ago

        Means a file that’s one byte in size will take at minimum 512 bytes on the external disk, but 4 KB on the internal one. If it were the other way around, that would partially explain the difference in space used.

        In any case, I doubt it that the block sizes would make so much difference in typical usage.

  • bleistift2@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 days ago

    Personally, I have no more tips that those that have already been presented in this comment section. What I would do now to find out what’s going on is the age-old divide-and-conquer debugging technique:

    Using rsync or a file manager (yours is Dolphin), only copy a few top-level directories at a time to your external drive. Note the directories you are about to move before each transfer. After each transfer check if the sizes of the directories on your internal drive (roughly) match those on your external drive (They will probably differ a little bit). You can also use your file manager for that.

    If all went fine for the first batch, proceed to the next until you find one where the sizes differ significantly. Then delete that offending batch from the external drive. Divide the offending batch into smaller batches (select fewer directories if you tried transferring multiple; or descend into a single directory and copy its subdirectories piecewise like you did before).

    In the end you should have a single directory or file which you have identified as problematic. That can then be investigated further.

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      5 days ago

      YIKES, I found that .local is around 30GB on my system ssd, over 50GB in the external SSD. Much of that is due to Steam and Kdenlive. I can try uninstalling Steam…

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      5 days ago

      Something interesting that I found: according to dolphin, many folders have many GB extra (e.g. 52GB vs 66GB for documents folder which is kind of crazy) while filelight records 52GB vs 112GB for documents folder, which if true, is kind of insane. Using du -sh records 53G vs 136G (they’re the same when using --apparent-size, weird. Specifically for Godot directory, it’s 3.8GB vs 41 GB!!!)!!! Files like videos and games seem to be about the same size, while Godot projects with git are much bigger. Weird.

      • bleistift2@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 days ago

        These differences really are insane. Maybe someone more knowledgeable can comment on why different tools differ so wildly in the total size they report.

        I have never used BTRFS, so I must resort to forwarding googled results like this one.

        Could you try compsize ~? If the Perc column is much lower than 100% or the Disk Usage column is much lower than the Uncompressed column, then you have some BTRFS-specific file-size reduction on your hands, which your external exFAT naturally can’t replicate.

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          5 days ago

          percentage of total is 83% (292G vs uncompressed 349G apparently)

      • bleistift2@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 days ago

        It’s good you found some pathological examples, but I’m at the end of my rope here.

        You can use these examples and the other information you gathered so far and ask specifically how these size discrepancies can be explained and maybe mitigated. I suggest more specialized communities for this such as !linux@lemmy.ml, !linux@programming.dev, !linux@lemmy.world, !linux4noobs@programming.dev, !linux4noobs@lemmy.world, !linuxquestions@lemmy.zip.

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          5 days ago

          I have cross posted to a Linux community. Thank you so much for all your help :DDDD

      • sbird@sopuli.xyzOP
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        5 days ago

        I’m assuming that Filelight count file size differently, and I will be trusting the result from dolphin more

      • sbird@sopuli.xyzOP
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        5 days ago

        using -H with rsync doesn’t seem to do anything unfortunately…

      • sbird@sopuli.xyzOP
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        5 days ago

        With Dolphin Godot directory is 1GB vs 5GB. Why is there a difference between filelight, dolphin, and du -sh? So weird

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          5 days ago

          Something about never knowing the time when one has two clocks

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          5 days ago

          it looks like much of the extra Godot bulk is in .git and .Godot directories

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      5 days ago

      Oh that’s actually a good idea. Thanks person! I will report back soon

  • bleistift2@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 days ago

    Let’s back up and check your assumptions: How did you check that the disk usage of your home folder is 385GiB and that there are 780GiB of free disk space on your external drive?

  • whyNotSquirrel@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 days ago

    is it pointing to the good folder? I’m never sure if quotes are enough to escape “space” characters (I avoid those to limit troubles actually)