How to know if my NVMe SSD needs TRIM

Question

Running Ubuntu 22.04 on my main laptop. I am using 4TB TEAMGROUP MP34 NVMe as my main drive. The file system is ext4.

Yesterday (Nov 16), while downloading some large files (about 300 files, 600GB total), suddenly my laptop started acting weirdly. Everything became very slow and my system crashed. I was able to repair it with a bootable USB and fsck. However the laptop was still very slow and the NVMe SSD was getting very hot, about 75 degrees Celsius (usually it's less than 35 degrees). The disk was only about 35% full. I run benchmark on the disk and the speeds were inconsistent and very slow. After several minutes of work the disk went to into read-only mode.

Initially, I thought there was some hardware problem. I opened the laptop and cleaned the contacts with isopropyl alcohol. I changed the NVMe with another and the laptop worked normally. I installed back my initial NVMe and the laptop was very slow again. At some point I decided to run sudo fstrim -av, it took about 5-6 minutes (trimmed about 2.9TB) and after that the laptop started working like new. I have been using it without any problems for more than 5 days now. I did some stress tests and benchmarks, everything works normally.

The output of the manual sudo fstrim -av I did on Nov 16:

/boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/nvme0n1p1
/: 2.9 TiB (3138692276224 bytes) trimmed on /dev/nvme0n1p2

It looks like fstrim.service was working fine:

cat /var/log/syslog | grep -a fstrim
Nov 13 01:43:37 dev fstrim[98095]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/nvme0n1p1
Nov 13 01:43:37 dev fstrim[98095]: /: 2.9 TiB (3140636598272 bytes) trimmed on /dev/nvme0n1p2
Nov 13 01:43:37 dev systemd[1]: fstrim.service: Deactivated successfully.

The last TRIM looks more normal:

cat /var/log/syslog | grep -a fstrim
Nov 20 01:26:54 dev fstrim[109477]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/nvme0n1p1
Nov 20 01:26:54 dev fstrim[109477]: /: 31.5 GiB (33783455744 bytes) trimmed on /dev/nvme0n1p2
Nov 20 01:26:54 dev systemd[1]: fstrim.service: Deactivated successfully.

The NVMe is pretty new and in good condition:

sudo smartctl -a /dev/nvme0
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.2.0-36-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number:                       TEAM TM8FP4004T
Serial Number:                      xxxxxxxxxxxxxxxxxxxxx
Firmware Version:                   VB421D65
PCI Vendor/Subsystem ID:            0x10ec
IEEE OUI Identifier:                0x00e04c
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          4,096,805,658,624 [4.09 TB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Fri Nov 17 12:57:17 2023 EET
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0014):     DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     100 Celsius
Critical Comp. Temp. Threshold:     110 Celsius
Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.00W       -        -    0  0  0  0   230000   50000
 1 +     4.00W       -        -    1  1  1  1     4000   50000
 2 +     3.00W       -        -    2  2  2  2     4000  250000
 3 -     0.50W       -        -    3  3  3  3     4000    8000
 4 -   0.0090W       -        -    4  4  4  4     8000   30000
Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        35 Celsius
Available Spare:                    100%
Available Spare Threshold:          32%
Percentage Used:                    0%
Data Units Read:                    4,447,105 [2.27 TB]
Data Units Written:                 8,885,998 [4.54 TB]
Host Read Commands:                 48,182,921
Host Write Commands:                112,476,615
Controller Busy Time:               0
Power Cycles:                       34
Power On Hours:                     2,423
Unsafe Shutdowns:                   11
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Error Information (NVMe Log 0x01, 8 of 8 entries)
No Errors Logged

Output of journalctl | grep "fstrim.*/:":

Jul 03 00:21:43 dev fstrim[27756]: /: 3.6 TiB (4009258434560 bytes) trimmed on /dev/nvme0n1p2
Jul 10 00:54:49 dev fstrim[1244594]: /: 3.6 TiB (4001406066688 bytes) trimmed on /dev/nvme0n1p2
Jul 17 00:32:58 dev fstrim[4040993]: /: 54.6 GiB (58677125120 bytes) trimmed on /dev/nvme0n1p2
Jul 24 00:29:14 dev fstrim[1600660]: /: 138.8 GiB (149000179712 bytes) trimmed on /dev/nvme0n1p2
Jul 31 00:35:13 dev fstrim[620323]: /: 135.8 GiB (145785393152 bytes) trimmed on /dev/nvme0n1p2
Aug 07 00:13:04 dev fstrim[35853]: /: 2.9 TiB (3226885373952 bytes) trimmed on /dev/nvme0n1p2
Aug 14 00:29:27 dev fstrim[125210]: /: 2.9 TiB (3230223196160 bytes) trimmed on /dev/nvme0n1p2
Aug 21 01:32:45 dev fstrim[332311]: /: 56.8 GiB (61013270528 bytes) trimmed on /dev/nvme0n1p2
Aug 28 00:11:05 dev fstrim[586592]: /: 90.3 GiB (96974286848 bytes) trimmed on /dev/nvme0n1p2
Sep 04 01:28:47 dev fstrim[16608]: /: 3 TiB (3257704198144 bytes) trimmed on /dev/nvme0n1p2
Sep 11 00:22:26 dev fstrim[21637]: /: 2.9 TiB (3238865485824 bytes) trimmed on /dev/nvme0n1p2
Sep 18 01:14:48 dev fstrim[126317]: /: 2.9 TiB (3240947859456 bytes) trimmed on /dev/nvme0n1p2
Sep 25 00:22:54 dev fstrim[410142]: /: 36.2 GiB (38895230976 bytes) trimmed on /dev/nvme0n1p2
Oct 02 00:31:31 dev fstrim[90432]: /: 3 TiB (3249296408576 bytes) trimmed on /dev/nvme0n1p2
Oct 09 00:48:51 dev fstrim[319128]: /: 54.2 GiB (58184278016 bytes) trimmed on /dev/nvme0n1p2
Oct 16 01:11:15 dev fstrim[29502]: /: 2.8 TiB (3103039946752 bytes) trimmed on /dev/nvme0n1p2
Oct 23 00:31:40 dev fstrim[85578]: /: 2.9 TiB (3152333541376 bytes) trimmed on /dev/nvme0n1p2
Oct 30 01:16:53 dev fstrim[212523]: /: 2.9 TiB (3140076969984 bytes) trimmed on /dev/nvme0n1p2
Nov 06 01:11:08 dev fstrim[38462]: /: 2.9 TiB (3138336178176 bytes) trimmed on /dev/nvme0n1p2
Nov 13 01:43:37 dev fstrim[98095]: /: 2.9 TiB (3140636598272 bytes) trimmed on /dev/nvme0n1p2
Nov 20 01:26:54 dev fstrim[109477]: /: 31.5 GiB (33783455744 bytes) trimmed on /dev/nvme0n1p2

Although an old question, this is related to the above numbers: Large amount of data trimmed after running fstrim. I don't restart my laptop very often and it's normal for me to have few weeks uptime.

I have been using SSDs for years and this is the first time I am experiencing a problem like this. Also the first time I had to execute fstrim manually. So, I am a bit puzzled. What could have caused this behavior? Is it normal? Is there a way to know if my NVMe SSD needs TRIM?

"2.9 TiB (3140636598272 bytes) trimmed"? Do you have something that regularly writes multiple TBs to your disk and deletes them? If so, wouldn't it be possible and that you had written and deleted a large fraction (maybe too large) of the space available in between runs of the trim service (which runs, once a week, I think)? — muru, Nov 21 '23 at 07:51
Not really, as you can see from the smartctrl I have only written about 4.5TB to the disk. I am using this NVMe since June, so it's about 5 months old. I am pretty sure, I didn't write 2.9TB between the November 6th and November 13th. Also, the problem I am describing, happened about 3 days after this big TRIM. On November 16th I did manual sudo fstrim -av, and it trimmed another 2.9TB. btw I just updated my question with my last log from yesterday, it looks more normal. I am open to any ideas what could have caused this. — sotirov, Nov 21 '23 at 09:04
@muru Just updated my question with the output of the manual TRIM that fixed my problem. — sotirov, Nov 21 '23 at 11:14
I think maybe you gave the answer yourself: If the fstrim.service suddenly trims an extremely large chunk of data, this could be an indication that something is amiss. I've never experienced exactly what you describe here - but I assume it must be some used blocks that haven't been properly cleaned up. Maybe an idea would be to make an alias that runs a manual trim, that you can run after downloading files of a certain size (say 25% of the disk). — Artur Meinild, Nov 21 '23 at 11:19
Also, I read somewhere that if your SSD has low write speed, it could be time for a trim. You can test write speed with: sudo dd if=/dev/zero of=benchmark.img bs=1G count=5 status=progress. I'd say if it's less than half of disk specs, or lower then 250 MB/s, then do a trim. — Artur Meinild, Nov 21 '23 at 11:25
For comparison, on my PC with a 1TB SSD on which TRIM hadn't been run in at least a month (the fstrim timer isn't enabled, I don't manually run it, and the only case in which it might have been run is if I booted into Windows, which hasn't happened for over a month), when I ran sudo fstrim -av just now, only 73 GB was trimmed. — muru, Nov 21 '23 at 11:38
@ArturMeinild It is the first time for me too. And I am trying to understand why this happened. I read every post in the AskUbuntu's trim tag for the past 5 years. The current read/write speeds are very close to what the manufacturer says they should be 3500MB/sec read and 2900MB/sec write. But when I had the problem, both read and write speeds were very inconsistent, sometime dropping to 1/10 of what they should be. — sotirov, Nov 21 '23 at 11:55
Yes that's also what's puzzling me - why? I've had some inexplicable ZFS errors once in a while on SSD, but nothing that amounts to several TB of data. — Artur Meinild, Nov 21 '23 at 11:57
@ArturMeinild I am using ext4, not sure how relevant is this, but adding it to the question. — sotirov, Nov 21 '23 at 12:01
I tried to form my comments to an answer - the best bet so far.. — Artur Meinild, Nov 21 '23 at 12:21
The fstrim command output looks nonsense to me. The fstrim service runs once per week on my system and *reports almost the same value each time*: ~250 GB for /home, ~200 GB for /, and 500 MB for /boot/efi. Obviously my / filesystem doesn't change so much (and it is almost 34GB full = 16%), my /home filesystem is also not so busy (and it is at most 240 MB full = 50%), and the EFI filesystem is almost static (and contains no more than 7MB of data = 2% full!) — FedKad, Nov 21 '23 at 16:05
I created another question https://askubuntu.com/questions/1493449 because of my above comment. — FedKad, Nov 21 '23 at 17:07

Artur Meinild · Accepted Answer · 2023-11-23T22:37:13.290

"How to know if my NVMe SSD needs TRIM"

Since I can't explain the phenomenon you experience, I also can't say for sure what the reason is, and which exact criteria you should monitor.

However, this will more be a collection of pointers you can monitor, and decide for yourself if you want to preemptively take action (do an extra manual trim with sudo fstrim -av) based on those.

So here are my suggestions:

Monitor the output of fstrim.service. If it trims an excessive amount (like over 1 TB), maybe take action.
Monitor how many GB of data you have downloaded since last trim. If this exceeds a threshold of total disk size (25-50%), consider taking action.
Monitor the SSD write speed. If it's less than half the stated value (or under 250 MB/s - not relevant in your case though), take action.

There may be more viable indicators to this list.

Testing `fstrim.service` performance

I tested on my own machine, and I can now confirm that the fstrim.service for me performs exactly as stated by @sotirov and @FedKad in the comments and in this Q&A.

This is my output of journalctl -t fstrim (lines are shortened):

Oct 23 00:04:55 xb fstrim[662497]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/disk/by-uuid/A49B-17AD
Oct 23 00:04:55 xb fstrim[662497]: /: 442 GiB (474638336000 bytes) trimmed on /dev/disk/by-uuid/f9c4d8ff-bfd6-404b-944e-4c753d>
-- Boot 34c888b0968f458084fa1cf674269326 --
Oct 30 00:04:53 xb fstrim[1303597]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/disk/by-uuid/A49B-17AD
Oct 30 00:04:53 xb fstrim[1303597]: /: 442.1 GiB (474652139520 bytes) trimmed on /dev/disk/by-uuid/f9c4d8ff-bfd6-404b-944e-4c7>
-- Boot 04117f235c354c1fb3c4f082bae4f563 --
Nov 06 00:16:25 xb fstrim[612946]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/disk/by-uuid/A49B-17AD
Nov 06 00:16:25 xb fstrim[612946]: /: 442 GiB (474547269632 bytes) trimmed on /dev/disk/by-uuid/f9c4d8ff-bfd6-404b-944e-4c753d>
Nov 13 00:19:03 xb fstrim[3960792]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/disk/by-uuid/A49B-17AD
Nov 13 00:19:03 xb fstrim[3960792]: /: 253.8 GiB (272512958464 bytes) trimmed on /dev/disk/by-uuid/f9c4d8ff-bfd6-404b-944e-4c7>
Nov 20 00:02:50 xb fstrim[2878811]: /boot/efi: 504.9 MiB (529436672 bytes) trimmed on /dev/disk/by-uuid/A49B-17AD
Nov 20 00:02:50 xb fstrim[2878811]: /: 258.4 GiB (277492928512 bytes) trimmed on /dev/disk/by-uuid/f9c4d8ff-bfd6-404b-944e-4c7>

It's evident here that:

fstrim.service trims the entire disk after the first boot.
fstrim.service then trims a rather large amount (253.7 GiB and 258.4 GiB) subsequently

Then replicating @sotirov's post, I tried running fstrim manually, which resulted in another large amount:

/: 274.1 GiB (294319964160 bytes) trimmed

And then, when running fstrim manually for the second time, the number is vastly different:

/: 84.3 MiB (88375296 bytes) trimmed

This confirms the behavior of fstrim. Maybe this behavior is buggy, or maybe I just don't understand the huge difference.

What I can tell is that the number of blocks trimmed gets reduced drastically after running fstrim manually. Also, I didn't notice any performance difference whatsoever, so in my case it seemed it didn't really matter.

Technical details:

Example on how to measure the data trimmed by fstrim.service (as per bullet 1):

#!/bin/bash
Set threshold for SSD trim
threshold=500
Get the latest trim value
ssdvalue=$(journalctl -t fstrim | tail -n 1 | awk '{ print $7 }')
If value is smaller than threshold, then OK - else do something
The logic should probably be reworked here, when dealing with Terrabytes of data - probably by using numfmt command or something similar
if [[ "${ssdvalue%.*}" -lt "$threshold" ]]
then
  echo "Everything OK"
else
  echo "Do something (run fstrim)"
fi

Example on how to measure SSD write speed (as per bullet 3 - run this script as root):

#!/bin/bash
Set path to SSD disk (to write benchmark file)
ssdpath=/path/to/ssd/
Set threshold for write speed
threshold=1000
Remove file if it exists
[[ -f "$ssdpath/ssdwrite" ]] && rm "$ssdpath/ssdwrite"
Run dd command to test write speed
dd if=/dev/zero of="$ssdpath/ssdwrite" conv=fdatasync bs=1G count=5 status=progress 2> /dev/shm/ssdspeed
Isolate the MB/s value
ssdvalue=$(tail -n 1 /dev/shm/ssdspeed | awk '{ print $10 }')
If value is larger than threshold, then OK - else do something
if [[ "$ssdvalue" -gt "$threshold" ]]
then
  echo "Everything OK"
else
  echo "Do something (run fstrim)"
fi

Number 1 may be related to https://askubuntu.com/q/729279/1157209 : After OS reboot, the first run of fstrim will send all empty blocks known to the OS to the SSD for recovery. — FedKad, Nov 21 '23 at 18:18
The "SSD write speed" test script does not take into consideration the time that is needed to flush the buffer cache to SSD. So, the number (GB/s) calculated may be substantially higher than the actual SSD write speed. — FedKad, Nov 22 '23 at 13:01
I added conv=fdatasync to the script, which should ensure that buffers are flushed. — Artur Meinild, Nov 22 '23 at 13:41

How to know if my NVMe SSD needs TRIM

1 Answers1

"How to know if my NVMe SSD needs TRIM"

Testing `fstrim.service` performance

Technical details:

Set threshold for SSD trim

Get the latest trim value

If value is smaller than threshold, then OK - else do something

The logic should probably be reworked here, when dealing with Terrabytes of data - probably by using numfmt command or something similar

Set path to SSD disk (to write benchmark file)

Set threshold for write speed

Remove file if it exists

Run dd command to test write speed

Isolate the MB/s value

If value is larger than threshold, then OK - else do something

Linked

How to know if my NVMe SSD needs TRIM

1 Answers1

"How to know if my NVMe SSD needs TRIM"

Testing fstrim.service performance

Technical details:

Set threshold for SSD trim

Get the latest trim value

If value is smaller than threshold, then OK - else do something

The logic should probably be reworked here, when dealing with Terrabytes of data - probably by using numfmt command or something similar

Set path to SSD disk (to write benchmark file)

Set threshold for write speed

Remove file if it exists

Run dd command to test write speed

Isolate the MB/s value

If value is larger than threshold, then OK - else do something

Linked

Testing `fstrim.service` performance