ThinkPad UEFI and the suspend problem
I have a ThinkPad X220 Tablet which I have used with Ubuntu since July 2012. There were never any issues with the hardware. On 2015-10-25 I have switched to Fedora and installed it properly using UEFI boot.
Waking up
On Saturday, 2016-03-19, I had a mysterious problem: The ThinkPad would not wake up from suspend any more. There is a short video and a longer video which shows how the ThinkPad tries to wake up and fails.
How long has the Fedora system been installed? That can be quickly checked with
the awesome datetime
module in Python:
import datetime diff = datetime.date(2016, 3, 19) - datetime.date(2015, 10, 25) print(diff.days)
That outputs 146
. So this is a little less than half a year. It was really
strange as I did not really do anything special with the machine. And I was a
bit reluctant to believe that Fedora just broke my laptop that way.
Other kernels
The kernel that I was running on that Saturday was 4.4.5-300.fc23.x86_64
. I
have automatic updates installed and read in the email on Friday that I got a
new kernel installed. That is nothing fancy as this happens every week or so.
Only when the system did not wake up I suspected the new kernel for that
behavior. And indeed, it got installed on the batch on Friday:
[root@martin-friese mu]# env LC_ALL=C dnf history info 251 Transaction ID : 251 Begin time : Fri Mar 18 14:50:13 2016 Begin rpmdb : 4555:f742ed24f025e31a2cf17b39e7cbbaae3adede0e End time : 14:51:35 2016 (82 seconds) End rpmdb : 4554:3afdbed35e1c373fe7e000bc5d1e5258054c1c33 User : System <unset> Return-Code : Success Transaction performed with: Installed dnf-1.1.7-2.fc23.noarch @updates Installed rpm-4.13.0-0.rc1.12.fc23.x86_64 @updates Packages Altered: Erase kernel-4.4.2-301.fc23.x86_64 @updates Install kernel-4.4.5-300.fc23.x86_64 @updates Erase kernel-core-4.4.2-301.fc23.x86_64 @updates Install kernel-core-4.4.5-300.fc23.x86_64 @updates Erase kernel-devel-4.4.2-301.fc23.x86_64 @updates Install kernel-devel-4.4.5-300.fc23.x86_64 @updates Upgraded kernel-headers-4.4.4-301.fc23.x86_64 @updates Upgrade 4.4.5-300.fc23.x86_64 @updates Erase kernel-modules-4.4.2-301.fc23.x86_64 @updates Install kernel-modules-4.4.5-300.fc23.x86_64 @updates Erase kernel-modules-extra-4.4.2-301.fc23.x86_64 @updates Install kernel-modules-extra-4.4.5-300.fc23.x86_64 @updates Erase kmod-VirtualBox-4.4.2-301.fc23.x86_64-5.0.14-1.fc23.x86_64 @@commandline Upgraded libinput-1.2.1-4.fc23.x86_64 @updates Upgrade 1.2.2-1.fc23.x86_64 @updates Upgraded python-pygments-2.0.2-3.fc23.noarch @@commandline Upgrade 2.1.3-1.fc23.noarch @updates Upgraded python3-pygments-2.0.2-3.fc23.noarch @@commandline Upgrade 2.1.3-1.fc23.noarch @updates Upgrade qt-1:4.8.7-12.fc23.x86_64 @updates Upgraded qt-1:4.8.7-5.fc23.x86_64 @updates Upgrade qt-common-1:4.8.7-12.fc23.noarch @updates Upgraded qt-common-1:4.8.7-5.fc23.noarch @updates Upgrade qt-devel-1:4.8.7-12.fc23.x86_64 @updates Upgraded qt-devel-1:4.8.7-5.fc23.x86_64 @updates Upgrade qt-mysql-1:4.8.7-12.fc23.x86_64 @updates Upgraded qt-mysql-1:4.8.7-5.fc23.x86_64 @updates Upgrade qt-x11-1:4.8.7-12.fc23.x86_64 @updates Upgraded qt-x11-1:4.8.7-5.fc23.x86_64 @updates
I would not have experienced the new kernel on Friday. Only on Saturday I would have started the laptop again and got the 4.4.5 kernel. This made me suspect the kernel.
A nice thing about Fedora is that it keeps three kernels installed by default. On the GRUB boot manager you can select one of the older kernels. This is very nice if you experience a crash. So I just booted the 4.4.4 kernel and expected the problem to go away. But it did not! I also tried kernel 4.4.3 and the problem persisted.
At this point I was getting worried. The other packages that got upgraded in that batch should not affect the suspend in any way. So if it was not the kernel, it must have been something else.
I put an Ubuntu installation on a USB thumbdrive. Ubuntu has been running fine
on that laptop. Although I did not want to go back to Ubuntu at this point, I
needed to see if it was a software of hardware issue. The Ubuntu 15.10 booted
just fine. I entered systemctl suspend
and let the system fall into suspend.
But it would not wake up either! Trying to keep calm I downloaded CentOS and
put that onto the thumbdrive. A really old kernel should be fine, I hoped. The
same problem occurred there.
At this point I had to admit that it was a hardware issue. But what could that possibly be?
Firmware setup
The only way I know to interact with the hardware most directly is the firmware
setup. I started the ThinkPad pressing F1
and entering the setup. There I
have a password, so I had to enter that. Nothing out of the ordinary could be
seen there. A little unsatisfied I tried to reboot the machine and got the
following:
And trying to change something else got me this:
Now that is new! I never had that problem before.
Firmware upgrade
The story of the firmware upgrade has already been told on my blog, although in German. So if you know that part of the story, you can safely skip this section.
Since Simon has the exact same laptop, I have talked about it with him. We concluded that an UEFI upgrade could relieve the thing. Perhaps it could get worse, but having a laptop that does not suspend it not that useful any more. If my laptop was some mobile workstation, it would be fine. But mine is rather one that you carry around in a laptop all the time. Having to cold-boot that every time was certainly a bad thing.
Upgrading firmware is one of the things that I really dislike. Upgrading software on Linux is just a blast, the distribution does that for you. The UEFI on my newish tower works by putting the new firmware onto a USB drive and attaching it to a special USB port on the back. Then in the graphical UEFI setup you can just perform the upgrade. Independent of the operating system installed. Lenovo has a Windows executable that would upgrade the firmware. I did not want to use that as there has never been a Windows on my machine and I would like to keep it that way.
The other way offered by Lenovo is an ISO image that you can boot from. Nice, I thought, and downloaded that one. The download was over HTTP only. I cringed right there. I mean the new firmware for my laptop, the device I do all my stuff with, is downloaded over an insecure connection. If one wanted to plant some malware deep into my system, that has been the chance for it. Although I tend to be rather paranoid and picky about that, I also liked my laptop to suspend again. So I just took the plunge and wanted to perform the upgrade.
First I tried to dd
that ISO image onto a thumbdrive and boot from that. It
did not work. I then tried to remove the password from the UEFI and booted
again. Did not work either. I tried to change the boot mode from UEFI only to
legacy only as I suspected that the USB drive would not be booted in UEFI
mode or so. The current firmware did not let me save that setting, I just got
an error like in the images above. Next I tried unetbootin
to put the ISO
onto the thumbdrive. That did not pan out either.
The last resort that Simon suggested was to use his DVD drive to first burn the ISO onto that and boot from that. Our ThinkPads do not have an optical drive built in but he has the UltraBase docking station with the DVD drive in it. So I went to his place and we tried that. I wrote to the DVD with K3B and tried to boot from it. That did not work. Then I tried that again and then checked the output. Did not work either. We used a different DVD, did not work either. Then he burned it using his Windows 7 machine. Neither did that work. On the command line I tried to burn it, also no success. We were a little unsettled.
Digging through the Arch Linux wiki and the ThinkPad wiki we found that one can
just extract the firmware upgrader from the ISO and put it into the /boot
folder and add a boot entry to GRUB for it. This way one could just start the
upgrade via GRUB. Sounded good. We tried and just got the following error:
linux16 not found
We first thought that it had to do with my encrypted /
drive and that the
command linux16
was there. A little more digging told us that the firmware
upgrade that we want to boot actually is a 16-Bit DOS environment. And the UEFI
that I have in the ThinkPad cannot boot 16-Bit operating systems. I mean, why
should it ever do that for a normal user?
Normally you would just switch the boot mode to legacy first or legacy only. As the firmware was already so crippled that it could not save that change, there was no way to do this. There seemed no way to upgrade the UEFI with that ISO. Simon had a Windows installation on a HDD. Putting that into the drive caddy of the docking station I was able to boot his Windows 7 installation. Downloading the firmware upgrade through an unpatched Windows 7 with some old version of Google Chrome via HTTP from the Lenovo website I was certain that I would open up my machine to every person who wanted to get in.
Either way, the Windows program ran through and I had to restart my machine. The fan went on full and the system installed the UEFI upgrade. Sadly it was not the latest version but something in between. Still I hoped that it would make this option saving problem go away.
After a reboot into my Fedora and testing the suspend it became clear quickly that nothing has changed. In the UEFI setup I could not save settings either. The laptop was still broken a bit. We called it a night and concluded that I just have to live with a broken suspend now.
Intermediate workaround
In order to prevent the system from going to sleep, I put the following into
the /etc/systemd/logind.conf
configuration file:
[Login] HandleLidSwitch=lock
This will just lock the screen when I close the lid. This is not great and one should not put the laptop in the backpack. But one can at least carry it from one room to the next in a physically closed state.
Feedback from Fedora developer list
I then posed that question on the Unix part of Stack Exchange. Unfortunately I did not get an answer which remove the problem.
Then I asked on the Fedora developer list. Trying to be as careful as possible as I did not want to imply that the distribution killed the waking up. Perhaps it was an accidental hardware failure while the laptop was running Fedora.
Somebody got back to me and had the hunch that the NVRAM where the UEFI stores
its data is somehow corrupted. He asked me to run efibootmgr -v
and
ls -l /sys/firmware/efi/efivars/
to see whether there is something fishy
there. The first couple lines of first command give the following output:
BootCurrent: 0129 Timeout: 0 seconds BootOrder: 0129,000A,0009,0006,0007,0008,000B,000C,000D,000E,000F,0010,0011,0012,0013 Boot0000 Setup FvFile(721c8b66-426c-4e86-8e99-3457c46ab0b9) Boot0001 Boot Menu FvFile(126a762d-5758-4fca-8531-201a7f57f850) Boot0002 Diagnostic Splash Screen FvFile(a7d8d9a6-6ab0-4aeb-ad9d-163e59a7a380) Boot0003 Startup Interrupt Menu FvFile(f46ee6f4-4785-43a3-923d-7f786c3c8479) Boot0004 ME Configuration Menu FvFile(82988420-7467-4490-9059-feb448dd1963) Boot0005 Rescue and Recovery FvFile(665d3f60-ad3e-4cad-8e26-db46eee9f1b5) Boot0006* USB CD VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,86701296aa5a7848b66cd49dd3ba6a55) Boot0007* USB FDD VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,6ff015a28830b543a8b8641009461e49) Boot0008* ATAPI CD0 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a35401) Boot0009* ATA HDD2 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f602) Boot000A* ATA HDD0 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f600) Boot000B* ATA HDD1 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f601) Boot000C* USB HDD VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,33e821aaaf33bc4789bd419f88c50803) Boot000D* PCI LAN VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,78a84aaf2b2afc4ea79cf5cc8f3d3803) Boot000E* ATAPI CD1 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a35403) Boot000F* ATAPI CD2 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a35404) Boot0010 Other CD VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a35406) Boot0011* ATA HDD3 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f603) Boot0012* ATA HDD4 VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f604) Boot0013 Other HDD VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f606) Boot0014* IDER BOOT CDROM PciRoot(0x0)/Pci(0x16,0x2)/Ata(0,1,0) Boot0015* IDER BOOT Floppy PciRoot(0x0)/Pci(0x16,0x2)/Ata(0,0,0) Boot0016* ATA HDD VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f6) Boot0017* ATAPI CD: VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a354) Boot0018* PCI LAN VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,78a84aaf2b2afc4ea79cf5cc8f3d3803) Boot0019* kubuntu HD(1,GPT,9feb65d0-1f99-4f66-8cc1-1e2aa3c71354,0x800,0xf3800)/File(\EFI\kubuntu\shimx64.efi) Boot001A* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) Boot001B* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) Boot001C* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) Boot001D* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) Boot001E* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) Boot001F* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) Boot0020* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) Boot0021* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) Boot0022* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) Boot0023* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) Boot0024* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
The first thirty entries look sensible. The repeated Fedora
entries look
fishy. A quick grep -c Fedora efibootmgr.txt
gave 272 entries. Those are
quite a lot of boot entries for a single operating system installation.
And the first few lines of the second command gave:
insgesamt 0 -rw-r--r--. 1 root root 12 3. Apr 11:55 AcpiGlobalVariable-af9ffd67-ec10-488a-9dfc-6cbf5ee22c2e -rw-r--r--. 1 root root 6 3. Apr 11:55 BiosSetup-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 46 3. Apr 11:55 Boot0000-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 54 3. Apr 11:55 Boot0001-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 84 3. Apr 11:55 Boot0002-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 80 3. Apr 11:55 Boot0003-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 78 3. Apr 11:55 Boot0004-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 74 3. Apr 11:55 Boot0005-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 64 3. Apr 11:55 Boot0006-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 66 3. Apr 11:55 Boot0007-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 71 3. Apr 11:55 Boot0008-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 69 3. Apr 11:55 Boot0009-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 69 3. Apr 11:55 Boot000A-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 69 3. Apr 11:55 Boot000B-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 66 3. Apr 11:55 Boot000C-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 66 3. Apr 11:55 Boot000D-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 71 3. Apr 11:55 Boot000E-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 71 3. Apr 11:55 Boot000F-8be4df61-93ca-11d2-aa0d-00e098032b8c -rw-r--r--. 1 root root 69 3. Apr 11:55 Boot0010-8be4df61-93ca-11d2-aa0d-00e098032b8c
This was certainly not good. And it also serves as an explanation why saving anything in the setup just fails.
Upstream bug report
Par the suggestion on the developer list I have filed an upstream bug. They told me to try a couple things to nail down the issue.
The output of efibootmgr -v
before and after looks about like this:
@@ -1,7 +1,6 @@ BootCurrent: 000A Timeout: 0 seconds -BootOrder: 004B,000A,0009,0006,0007,0008,000B,000C,000D,000E,000F,0010,0011,0012,0013 +BootOrder: 004C,000A,0009,0006,0007,0008,000B,000C,000D,000E,000F,0010,0011,0012,0013 Boot0000 Setup FvFile(721c8b66-426c-4e86-8e99-3457c46ab0b9) Boot0001 Boot Menu FvFile(126a762d-5758-4fca-8531-201a7f57f850) Boot0002 Diagnostic Splash Screen FvFile(a7d8d9a6-6ab0-4aeb-ad9d-163e59a7a380) @@ -78,4 +77,5 @@ Boot0049* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) Boot004A* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) Boot004B* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) +Boot004C* Fedora HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi) Boot0050* setup HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\BOOT\BOOTX64.EFI)
A new boot entry is created and that is set as the first element of the
BootOrder
. This repeats itself on every boot. At the time of this writing,
they are not exactly sure what causes this issue.
This also puts this notice I saw only once before GRUB into a picture:
System BootOrder not found. Initializing defaults. Creating boot entry "Boot0129" with label "Fedora" for the file "EFIfedorashim.efi". Could not create variable: 9
Perhaps this is related to the error code of 0x9
that one can see on the
previous screenshots?
Automatic cleanup
Since I do not want my machine to exhibit the same waking up problem as before,
I now regularly clean up the boot entries. For that I wrote a simple
script
that parses the output of efibootmgr -v
and then uses efibootmgr -b XXXX -B
to delete that entry. It only does that when 50 entries have accumulated. That
script resides in /etc/cron.daily/unused_boot_entries.py
and is executed
approximately daily.
So far I receive an email once in a while that it has deleted some entries. My machine runs just fine now.
Conclusion (so far)
There is one article about the sad state of EFI variable storage. I have also read other articles about laptops being completely bricked by deleting variables from the UEFI. All in all it is sad that the software is able to interfere with hardware features in such a way.
My problem here is successfully worked around for now. I hope that at some point it will not create more boot entries. And I hope that over time the UEFI implementations will become more robust and less broken.