update 2: The Linux community has suggested that I use a tar file to backup, as this preserves symlinks. With that, the home directory now takes up just 290 ish GiB, as it should. Now I will be distro hopping, wish me luck!
update: I was able to copy it! There are still some folders that are really big (as many have said, it is probably because symlinks aren’t supported in exFAT. When I transfer these files over to btrfs, will the symlinks come back or are they permanently gone?) but, with the uninstallation of Steam and Kdenlive (each taking a ridiculous amount of storage), removing a couple games I don’t really play, and removing old folders that lingered around from already uninstalled programs means I now have enough space to fit my home folder in the SSD (like 23 GiB left, so the lack of symlinks still hurts, but still, it fits!)
When running
rsync -Paz /home/sbird "/run/media/sbird/My Passport/sbird"
As told by someone, I run into a ran out of storage error midway. Why is this? My disk usage is about 385 GiB for my home folder, and there is around 780 GiB of space in the external SSD (which already has stuff like photos and documents). Does rsync make doubly copies of it or something? That would be kind of silly. Or is it some other issue?
Note that the SSD is from a reputable brand (Western Digital) so it is unlikely that it is reporting a fake amount of storage.
EDIT: Wait, is it because my laptop SSD is BTRFS and the external SSD is exFAT? Could that be the issue? That would be kind of weird, why would files become so much more bigger with the external SSD?
Thanks everyone for your help to troubleshoot! It was super helpful! Now I need to go to bed, since I’ve been up so late it’s already tomorrow!
rsync does not delete files at the target by default, it has kept all of the original files when they were deleted from the original source location.
You must specify --delete for it to also delete files at the target location when they are deleted at the source.
If you want to be extra safe, you can use --delete-first to run the deletion process before moving files, ensuring that you always have space at the target.
The directory “sbird” in the SSD did not exist beforehand though?
Are you saying this is your first run?
Run ‘ncdu /run/media/sbird’ to find out why there’s no space on it.
The simplest explanation for the size difference could be if you have a symlink in your home folder pointing outside it. Idk if rsync traverses symlinks and filesystems by default, i.e. goes into linked folders instead of just copying the link, but you might want to check that. Note also that exFAT doesn’t support symlinks, dunno what rsync does in that case.
It would be useful to run
ls -R >file.txtin both the source and target directories and diff the files to see if the directory structure changed. (The-loption would report many changes, since exFAT doesn’t support Unix permissions either.) Apps like Double Commander can diff the directories visually (be sure to uncheck ‘compare by content’).As others mentioned, if you have hardlinks in the source, they could be copied multiple times to the target, particularly since exFAT, again, doesn’t have hardlinks. But the primary source of hardlinks in normal usage would probably be git, which employs them to compact its structures, and I doubt it that you have >300 GB of git repositories.
A second possibility is the deduplication feature of BTRFS. If he made copies of files on his SSD, they only take up space there when changing something - thats how i keep 5 differently modded Cyberpunk 2077 installations on my drive while only taking up a fraction of space that would be needed - I wouldn’t be able to copy this drive 1:1 onto a different filesystem.
Ah, I knew the mention of btrfs heebied my jeebies a little, but forgot about the CoW thing.
I’m guessing some btrfs-specific utils are necessary to figure out how much it cow’ed.
Idk if rsync traverses symlinks and filesystems by default,
From the man page:
Beginning with rsync 3.0.0, rsync always sends these implied directories as real directories in the file list, even if a path element is really a symlink on the sending side. This prevents some really unexpected behaviors when copying the full path of a file that you didn’t realize had a symlink in its path.
That means, if you’re transferring the file
~/foo/bar/file.txt, where~/foo/bar/is a symlink to~/foo/baz, thebazdirectory will essentially be duplicated and end up as the real directory/SSD/foo/barand/SSD/foo/baz.Yeah, that would do it. If OP has such symlinks, they probably need to add an exception for rsync.
Simple: Exfat does not support symbolic links. So every file that’s just a symbolic link on your btrfs filesystem is getting copied in full (the link is being resolved) to your Exfat drive.
Solution: Don’t use Exfat. For backups from btrfs, I recommend using btrfs with compression enabled.
Also don’t forget to rebalance your btrfs partitions regularly to reclaim lost space! Also, delete old snapshots!
That makes a lot of sense. I can’t reformat the external SSD though, since it has a bunch of other files and needs to be used by my family (who are mostly Windows users)
There might be a possibility that recursion is happening and a directory is looping into itself and filling up your storage.
I have some suggestions for your command to help make a more consistent experience with rsync.
1:
--dry-run(-n) is great for troubleshooting issues. It performs a fake transfer so you can sort issues before moving any data. Remove this option when you are confident about making changes.2:
--verbose --human-readable(-vh) will give you visual feedback so you can see what is happening. Combine this with --dry-run so you get a full picture of what rsync will attempt to do before any changes are made.3:
--compress(-z) might not be suitable for this specific job, as I understand, it’s meant to compress data during a file transfer intended over a network. In your commands current state, it’s just adding extra processing power which might not be useful for a connected device.4: If you are transferring directories/folders, I found more consistent behaviour from rsync by adding a trailing slash at the end of a path. For example use “/home/username/folder_name/” and not “/home/username/folder_name”. I’ve run into recursion issues by not using a trailing slash.
Don’t use a trailing slash if you are transferring a single file. That distinction helps me to understand what I’m transferring too.
5:
--deletewill make sure your source folder and destination folder are a 1:1 match. Any files deleted in the source folder will be deleted in the destination folder. If you want to keep any and all added files in your destination folder, this option can be ignored.--archive(-a) and--partial --progress(-P) are both good and don’t need to be changed or removed.If you do happen to be running into a recursion issue that’s filling up your storage, you may need to look into using the
--excludeoption to exclude the problem folder.How do I find which folder is causing problems? When using --verbose and --dry-run, it goes way too fast and the terminal can’t see all of the history
You can store the output of
rsyncin a file by usingrsync ALL_THE_OPTIONS_YOU_USED > rsync-output.txt. This creates a file called rsync-output.txt in your current directory which you can inspect later.This, however means that you won’t see the output right away. You can also use
rsync ALL_THE_OPTIONS_YOU_USED | tee rsync-output.txt, which will both create the file and display the output on your terminal while it is being produced.Having a quick scroll of the output file (neat tip with the > to get a text file, thanks!) nothing immediately jumps out to me. There isn’t any repeated folders or anything like that from a glance. Anything I should look out for?
You checked 385GiB of files by hand? Is that size made up by a few humongously large files?
I suggest using
uniqto check if you have duplicate files in there. (uniq’s input must be sorted first). If you still have the output file from the previous step, and it’s calledrsync-output.txt, dosort rsync-output.txt | uniq -dc. This will print the duplicates and the number of their occurrences.when using uniq nothing is printed (I’m assuming that means no duplicates?)
I’m sorry. I was stupid. If you had duplicates due to a file system loop or symlinks, they would all be under different names. So you wouldn’t be able to find them with this method.
running du command with --count-links as suggested by another user returns 384G (so that isn’t the problem it seems)
Ok then, that makes sense
If you don’t spot any recusion issues, I’d suggest looking for other issues and not spend too much time here. At least now you have some troubleshooting knowledge going forward. Best of luck figuring out the issue.
Does your terminal have a scroll back limit? You may need to change that setting if there is a limit.
That will depend on which terminal you are using and it may have a different name so I can’t really help more with this specific issue. You’ll have to search that up based on the terminal you are using.
Maybe you have hard links or sparse files in your source directory. Try with -H for hard links first. You can try with --sparse but I think hard links are more likely.
Using -H throws an error as symlinks aren’t supported in exFAT it seems.
By the way, do you have lots of torrents downloaded or large virtual machines installed? Both torrent clients and virtual machine managers use ‘sparse files’ to save space until you actually download the whole torrent or write a lot to the VM’s disk. Those files would be copied at full un-sparse size to exFAT.
If you have folders with such content, you can use e.g. Double Commander to check the actual used size of those folders (with ctrl-L in Doublecmd). Idk which terminal utils might give you those numbers in place, but aforementioned
ncducan calculate them and present as a tree.Edit: silly me, of course
duis the util to use, typically asdu -hsc dirname.using du -hsc returns 384G with /home/sbird, and 150G inside the external SSD (when it does not have any of the files transferred with rsync)
Well, that’s not what I meant. If you have directories with torrents or VMs,
dumight report different size for those directories on the source and target disks. Then that might mean that those are the culprits, depending on how much difference there is.With just the source disk, you can check
du -hsc dirnameversusdu -hsc --apparent-size dirnameto check if the disk space used is much smaller than the ‘apparent size’, which would mean there are sparse files in the directory, i.e. not fully written to disk. rsync would copy those files to full ‘apparent size’.As mentioned elsewhere, btrfs might also save space on the source disk by not writing duplicate files multiple times — but idk if
duwould report that, since this feature is specific to btrfs internals.
For a typical user, hard links would be mostly employed by git for its internal structures, and it’s difficult to accumulate over 300 GB of git repos.
Sparse files would actually be more believable, since they’re used by both torrent clients and virtual machines.
BTRFS supports compression and deduplication, so the actual disk space used might be less than the total size of your home directory. I’d run
du -sh --apparent-size /home/sbirdto check how large your home dir actually is. If it’s larger than 780 GiB, there’s your problem. Otherwise there might be hardlinks which rsync is copying multiple times; add the-Hflag to copy hardlinks as hardlinks.382G for /home/sbird (definitely not more than 780G) so that is strange. Using -H doesn’t work since the external SSD is exFAT (which from a quick search doesn’t support symlinks)
You can rerun the
ducommand with--count-linksto count hardlinks multiple times. If that shows >780GiB you have a lot of hardlinks somewhere, which you can narrow down by rerunning the command on each of the subdirectories in your home directory.Your options would be to delete the hardlinks to decrease your total file size, exclude them from the rsync with
--exclude, or repartition your SSD to a filesystem that supports hardlinks.With --count-links, it is just 384G so that is probably not the issue?
That’s odd, maybe it has to do with symlinks? Adding
--dereferenceto theducommand will count the file size of the files referenced by symlinks. If that doesn’t show anything abnormal, I’d compare the directory sizes between your home directory and the rsync backup and try to find where they differ significantly. If it does show a much larger size, narrow down the location of the relevant symlinks (may be a hidden directory) and either delete them or exclude them from the rsync.
Does
duknow about btrfs’ copy-on-write? I assumed some btrfs-specific utils would be necessary to see that.Edit: this page suggests that
duis unaffected by btrfs, as in shows the full uncompressed and non-deduplicated size.
Could it be you have lots of tiny files and/or a rather large-ish block size on your SSD?
You can check the block size with
sudo blockdev --getbsz /dev/$THE_DEVICE.using the command returns 512 for the external SSD and 4096 for the SSD in my laptop. What does that mean?
What does that mean?
Imagine your hard drive like a giant cupboard of drawers. Each drawer can only have one label, so you must only ever store one “thing” in one drawer, otherwise you wouldn’t be able to label the thing accurately and end up not knowing what went where.
If you have giant drawers (a large block size), but only tiny things (small files) to store, you end up wasting a lot of space in the drawer. It could fit a desktop computer, but you’re only putting in a phone. This problem is called “internal fragmentation” and causes files to take up way more space than it would seem they need.
–––––
However, in your case, the target block size is actually smaller, so this is not the issue you’re facing.
Means a file that’s one byte in size will take at minimum 512 bytes on the external disk, but 4 KB on the internal one. If it were the other way around, that would partially explain the difference in space used.
In any case, I doubt it that the block sizes would make so much difference in typical usage.
Personally, I have no more tips that those that have already been presented in this comment section. What I would do now to find out what’s going on is the age-old divide-and-conquer debugging technique:
Using rsync or a file manager (yours is Dolphin), only copy a few top-level directories at a time to your external drive. Note the directories you are about to move before each transfer. After each transfer check if the sizes of the directories on your internal drive (roughly) match those on your external drive (They will probably differ a little bit). You can also use your file manager for that.
If all went fine for the first batch, proceed to the next until you find one where the sizes differ significantly. Then delete that offending batch from the external drive. Divide the offending batch into smaller batches (select fewer directories if you tried transferring multiple; or descend into a single directory and copy its subdirectories piecewise like you did before).
In the end you should have a single directory or file which you have identified as problematic. That can then be investigated further.
YIKES, I found that .local is around 30GB on my system ssd, over 50GB in the external SSD. Much of that is due to Steam and Kdenlive. I can try uninstalling Steam…
Something interesting that I found: according to dolphin, many folders have many GB extra (e.g. 52GB vs 66GB for documents folder which is kind of crazy) while filelight records 52GB vs 112GB for documents folder, which if true, is kind of insane. Using du -sh records 53G vs 136G (they’re the same when using --apparent-size, weird. Specifically for Godot directory, it’s 3.8GB vs 41 GB!!!)!!! Files like videos and games seem to be about the same size, while Godot projects with git are much bigger. Weird.
These differences really are insane. Maybe someone more knowledgeable can comment on why different tools differ so wildly in the total size they report.
I have never used BTRFS, so I must resort to forwarding googled results like this one.
Could you try
compsize ~? If thePerccolumn is much lower than 100% or theDisk Usagecolumn is much lower than theUncompressedcolumn, then you have some BTRFS-specific file-size reduction on your hands, which your external exFAT naturally can’t replicate.percentage of total is 83% (292G vs uncompressed 349G apparently)
It’s good you found some pathological examples, but I’m at the end of my rope here.
You can use these examples and the other information you gathered so far and ask specifically how these size discrepancies can be explained and maybe mitigated. I suggest more specialized communities for this such as !linux@lemmy.ml, !linux@programming.dev, !linux@lemmy.world, !linux4noobs@programming.dev, !linux4noobs@lemmy.world, !linuxquestions@lemmy.zip.
I have cross posted to a Linux community. Thank you so much for all your help :DDDD
I’m assuming that Filelight count file size differently, and I will be trusting the result from dolphin more
using -H with rsync doesn’t seem to do anything unfortunately…
With Dolphin Godot directory is 1GB vs 5GB. Why is there a difference between filelight, dolphin, and du -sh? So weird
Something about never knowing the time when one has two clocks
it looks like much of the extra Godot bulk is in .git and .Godot directories
Oh that’s actually a good idea. Thanks person! I will report back soon
Let’s back up and check your assumptions: How did you check that the disk usage of your home folder is 385GiB and that there are 780GiB of free disk space on your external drive?
Checking “properties” using Dolphin. Could that be incorrect?
Does that include hidden folders?
I’d say you can trust that.
Using du -sh --apparent-size /home/sbird returns 382G, with external SSD 129G with no files from rsync (as expected)
Is your external SSD partitioned into multiple drives? Are you saying that your SSD is reported to be 129GB by the OS?
129G used (as in the files that were there before rsync)
is it pointing to the good folder? I’m never sure if quotes are enough to escape “space” characters (I avoid those to limit troubles actually)
Spaces are enough to handle spaces. That file path is valid.







