ThinkPad UEFI and the suspend problem

I have a ThinkPad X220 Tablet which I have used with Ubuntu since July 2012. There were never any issues with the hardware. On 2015-10-25 I have switched to Fedora and installed it properly using UEFI boot.

Waking up

On Saturday, 2016-03-19, I had a mysterious problem: The ThinkPad would not wake up from suspend any more. There is a short video and a longer video which shows how the ThinkPad tries to wake up and fails.

How long has the Fedora system been installed? That can be quickly checked with the awesome datetime module in Python:

import datetime
diff = datetime.date(2016, 3, 19) - datetime.date(2015, 10, 25)
print(diff.days)

That outputs 146. So this is a little less than half a year. It was really strange as I did not really do anything special with the machine. And I was a bit reluctant to believe that Fedora just broke my laptop that way.

Other kernels

The kernel that I was running on that Saturday was 4.4.5-300.fc23.x86_64. I have automatic updates installed and read in the email on Friday that I got a new kernel installed. That is nothing fancy as this happens every week or so. Only when the system did not wake up I suspected the new kernel for that behavior. And indeed, it got installed on the batch on Friday:

[root@martin-friese mu]# env LC_ALL=C dnf history info 251
Transaction ID : 251
Begin time     : Fri Mar 18 14:50:13 2016
Begin rpmdb    : 4555:f742ed24f025e31a2cf17b39e7cbbaae3adede0e
End time       :            14:51:35 2016 (82 seconds)
End rpmdb      : 4554:3afdbed35e1c373fe7e000bc5d1e5258054c1c33
User           : System <unset>
Return-Code    : Success
Transaction performed with:
    Installed     dnf-1.1.7-2.fc23.noarch         @updates
    Installed     rpm-4.13.0-0.rc1.12.fc23.x86_64 @updates
Packages Altered:
    Erase    kernel-4.4.2-301.fc23.x86_64                               @updates
    Install  kernel-4.4.5-300.fc23.x86_64                               @updates
    Erase    kernel-core-4.4.2-301.fc23.x86_64                          @updates
    Install  kernel-core-4.4.5-300.fc23.x86_64                          @updates
    Erase    kernel-devel-4.4.2-301.fc23.x86_64                         @updates
    Install  kernel-devel-4.4.5-300.fc23.x86_64                         @updates
    Upgraded kernel-headers-4.4.4-301.fc23.x86_64                       @updates
    Upgrade                 4.4.5-300.fc23.x86_64                       @updates
    Erase    kernel-modules-4.4.2-301.fc23.x86_64                       @updates
    Install  kernel-modules-4.4.5-300.fc23.x86_64                       @updates
    Erase    kernel-modules-extra-4.4.2-301.fc23.x86_64                 @updates
    Install  kernel-modules-extra-4.4.5-300.fc23.x86_64                 @updates
    Erase    kmod-VirtualBox-4.4.2-301.fc23.x86_64-5.0.14-1.fc23.x86_64 @@commandline
    Upgraded libinput-1.2.1-4.fc23.x86_64                               @updates
    Upgrade           1.2.2-1.fc23.x86_64                               @updates
    Upgraded python-pygments-2.0.2-3.fc23.noarch                        @@commandline
    Upgrade                  2.1.3-1.fc23.noarch                        @updates
    Upgraded python3-pygments-2.0.2-3.fc23.noarch                       @@commandline
    Upgrade                   2.1.3-1.fc23.noarch                       @updates
    Upgrade  qt-1:4.8.7-12.fc23.x86_64                                  @updates
    Upgraded qt-1:4.8.7-5.fc23.x86_64                                   @updates
    Upgrade  qt-common-1:4.8.7-12.fc23.noarch                           @updates
    Upgraded qt-common-1:4.8.7-5.fc23.noarch                            @updates
    Upgrade  qt-devel-1:4.8.7-12.fc23.x86_64                            @updates
    Upgraded qt-devel-1:4.8.7-5.fc23.x86_64                             @updates
    Upgrade  qt-mysql-1:4.8.7-12.fc23.x86_64                            @updates
    Upgraded qt-mysql-1:4.8.7-5.fc23.x86_64                             @updates
    Upgrade  qt-x11-1:4.8.7-12.fc23.x86_64                              @updates
    Upgraded qt-x11-1:4.8.7-5.fc23.x86_64                               @updates

I would not have experienced the new kernel on Friday. Only on Saturday I would have started the laptop again and got the 4.4.5 kernel. This made me suspect the kernel.

A nice thing about Fedora is that it keeps three kernels installed by default. On the GRUB boot manager you can select one of the older kernels. This is very nice if you experience a crash. So I just booted the 4.4.4 kernel and expected the problem to go away. But it did not! I also tried kernel 4.4.3 and the problem persisted.

At this point I was getting worried. The other packages that got upgraded in that batch should not affect the suspend in any way. So if it was not the kernel, it must have been something else.

I put an Ubuntu installation on a USB thumbdrive. Ubuntu has been running fine on that laptop. Although I did not want to go back to Ubuntu at this point, I needed to see if it was a software of hardware issue. The Ubuntu 15.10 booted just fine. I entered systemctl suspend and let the system fall into suspend. But it would not wake up either! Trying to keep calm I downloaded CentOS and put that onto the thumbdrive. A really old kernel should be fine, I hoped. The same problem occurred there.

At this point I had to admit that it was a hardware issue. But what could that possibly be?

Firmware setup

The only way I know to interact with the hardware most directly is the firmware setup. I started the ThinkPad pressing F1 and entering the setup. There I have a password, so I had to enter that. Nothing out of the ordinary could be seen there. A little unsatisfied I tried to reboot the machine and got the following:

../../_images/storage-security.jpg

Security Failed to save storage: LenovoSecurityConfig. Status 0x9. Press ENTER to continue

And trying to change something else got me this:

../../_images/storage-setup.jpg

Security Failed to save storage: Setup. Status 0x9. Press ENTER to continue

Now that is new! I never had that problem before.

Firmware upgrade

The story of the firmware upgrade has already been told on my blog, although in German. So if you know that part of the story, you can safely skip this section.

Since Simon has the exact same laptop, I have talked about it with him. We concluded that an UEFI upgrade could relieve the thing. Perhaps it could get worse, but having a laptop that does not suspend it not that useful any more. If my laptop was some mobile workstation, it would be fine. But mine is rather one that you carry around in a laptop all the time. Having to cold-boot that every time was certainly a bad thing.

Upgrading firmware is one of the things that I really dislike. Upgrading software on Linux is just a blast, the distribution does that for you. The UEFI on my newish tower works by putting the new firmware onto a USB drive and attaching it to a special USB port on the back. Then in the graphical UEFI setup you can just perform the upgrade. Independent of the operating system installed. Lenovo has a Windows executable that would upgrade the firmware. I did not want to use that as there has never been a Windows on my machine and I would like to keep it that way.

The other way offered by Lenovo is an ISO image that you can boot from. Nice, I thought, and downloaded that one. The download was over HTTP only. I cringed right there. I mean the new firmware for my laptop, the device I do all my stuff with, is downloaded over an insecure connection. If one wanted to plant some malware deep into my system, that has been the chance for it. Although I tend to be rather paranoid and picky about that, I also liked my laptop to suspend again. So I just took the plunge and wanted to perform the upgrade.

First I tried to dd that ISO image onto a thumbdrive and boot from that. It did not work. I then tried to remove the password from the UEFI and booted again. Did not work either. I tried to change the boot mode from UEFI only to legacy only as I suspected that the USB drive would not be booted in UEFI mode or so. The current firmware did not let me save that setting, I just got an error like in the images above. Next I tried unetbootin to put the ISO onto the thumbdrive. That did not pan out either.

The last resort that Simon suggested was to use his DVD drive to first burn the ISO onto that and boot from that. Our ThinkPads do not have an optical drive built in but he has the UltraBase docking station with the DVD drive in it. So I went to his place and we tried that. I wrote to the DVD with K3B and tried to boot from it. That did not work. Then I tried that again and then checked the output. Did not work either. We used a different DVD, did not work either. Then he burned it using his Windows 7 machine. Neither did that work. On the command line I tried to burn it, also no success. We were a little unsettled.

Digging through the Arch Linux wiki and the ThinkPad wiki we found that one can just extract the firmware upgrader from the ISO and put it into the /boot folder and add a boot entry to GRUB for it. This way one could just start the upgrade via GRUB. Sounded good. We tried and just got the following error:

linux16 not found

We first thought that it had to do with my encrypted / drive and that the command linux16 was there. A little more digging told us that the firmware upgrade that we want to boot actually is a 16-Bit DOS environment. And the UEFI that I have in the ThinkPad cannot boot 16-Bit operating systems. I mean, why should it ever do that for a normal user?

Normally you would just switch the boot mode to legacy first or legacy only. As the firmware was already so crippled that it could not save that change, there was no way to do this. There seemed no way to upgrade the UEFI with that ISO. Simon had a Windows installation on a HDD. Putting that into the drive caddy of the docking station I was able to boot his Windows 7 installation. Downloading the firmware upgrade through an unpatched Windows 7 with some old version of Google Chrome via HTTP from the Lenovo website I was certain that I would open up my machine to every person who wanted to get in.

Either way, the Windows program ran through and I had to restart my machine. The fan went on full and the system installed the UEFI upgrade. Sadly it was not the latest version but something in between. Still I hoped that it would make this option saving problem go away.

After a reboot into my Fedora and testing the suspend it became clear quickly that nothing has changed. In the UEFI setup I could not save settings either. The laptop was still broken a bit. We called it a night and concluded that I just have to live with a broken suspend now.

Intermediate workaround

In order to prevent the system from going to sleep, I put the following into the /etc/systemd/logind.conf configuration file:

[Login]
HandleLidSwitch=lock

This will just lock the screen when I close the lid. This is not great and one should not put the laptop in the backpack. But one can at least carry it from one room to the next in a physically closed state.

Feedback from Fedora developer list

I then posed that question on the Unix part of Stack Exchange. Unfortunately I did not get an answer which remove the problem.

Then I asked on the Fedora developer list. Trying to be as careful as possible as I did not want to imply that the distribution killed the waking up. Perhaps it was an accidental hardware failure while the laptop was running Fedora.

Somebody got back to me and had the hunch that the NVRAM where the UEFI stores its data is somehow corrupted. He asked me to run efibootmgr -v and ls -l /sys/firmware/efi/efivars/ to see whether there is something fishy there. The first couple lines of first command give the following output:

BootCurrent: 0129
Timeout: 0 seconds
BootOrder: 0129,000A,0009,0006,0007,0008,000B,000C,000D,000E,000F,0010,0011,0012,0013
Boot0000  Setup	FvFile(721c8b66-426c-4e86-8e99-3457c46ab0b9)
Boot0001  Boot Menu	FvFile(126a762d-5758-4fca-8531-201a7f57f850)
Boot0002  Diagnostic Splash Screen	FvFile(a7d8d9a6-6ab0-4aeb-ad9d-163e59a7a380)
Boot0003  Startup Interrupt Menu	FvFile(f46ee6f4-4785-43a3-923d-7f786c3c8479)
Boot0004  ME Configuration Menu	FvFile(82988420-7467-4490-9059-feb448dd1963)
Boot0005  Rescue and Recovery	FvFile(665d3f60-ad3e-4cad-8e26-db46eee9f1b5)
Boot0006* USB CD	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,86701296aa5a7848b66cd49dd3ba6a55)
Boot0007* USB FDD	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,6ff015a28830b543a8b8641009461e49)
Boot0008* ATAPI CD0	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a35401)
Boot0009* ATA HDD2	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f602)
Boot000A* ATA HDD0	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f600)
Boot000B* ATA HDD1	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f601)
Boot000C* USB HDD	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,33e821aaaf33bc4789bd419f88c50803)
Boot000D* PCI LAN	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,78a84aaf2b2afc4ea79cf5cc8f3d3803)
Boot000E* ATAPI CD1	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a35403)
Boot000F* ATAPI CD2	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a35404)
Boot0010  Other CD	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a35406)
Boot0011* ATA HDD3	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f603)
Boot0012* ATA HDD4	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f604)
Boot0013  Other HDD	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f606)
Boot0014* IDER BOOT CDROM	PciRoot(0x0)/Pci(0x16,0x2)/Ata(0,1,0)
Boot0015* IDER BOOT Floppy	PciRoot(0x0)/Pci(0x16,0x2)/Ata(0,0,0)
Boot0016* ATA HDD	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,91af625956449f41a7b91f4f892ab0f6)
Boot0017* ATAPI CD:	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,aea2090adfde214e8b3a5e471856a354)
Boot0018* PCI LAN	VenMsg(bc7838d2-0f82-4d60-8316-c068ee79d25b,78a84aaf2b2afc4ea79cf5cc8f3d3803)
Boot0019* kubuntu	HD(1,GPT,9feb65d0-1f99-4f66-8cc1-1e2aa3c71354,0x800,0xf3800)/File(\EFI\kubuntu\shimx64.efi)
Boot001A* Fedora	HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
Boot001B* Fedora	HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
Boot001C* Fedora	HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
Boot001D* Fedora	HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
Boot001E* Fedora	HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
Boot001F* Fedora	HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
Boot0020* Fedora	HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
Boot0021* Fedora	HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
Boot0022* Fedora	HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
Boot0023* Fedora	HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
Boot0024* Fedora	HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)

The first thirty entries look sensible. The repeated Fedora entries look fishy. A quick grep -c Fedora efibootmgr.txt gave 272 entries. Those are quite a lot of boot entries for a single operating system installation.

And the first few lines of the second command gave:

insgesamt 0
-rw-r--r--. 1 root root   12  3. Apr 11:55 AcpiGlobalVariable-af9ffd67-ec10-488a-9dfc-6cbf5ee22c2e
-rw-r--r--. 1 root root    6  3. Apr 11:55 BiosSetup-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   46  3. Apr 11:55 Boot0000-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   54  3. Apr 11:55 Boot0001-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   84  3. Apr 11:55 Boot0002-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   80  3. Apr 11:55 Boot0003-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   78  3. Apr 11:55 Boot0004-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   74  3. Apr 11:55 Boot0005-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   64  3. Apr 11:55 Boot0006-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   66  3. Apr 11:55 Boot0007-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   71  3. Apr 11:55 Boot0008-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   69  3. Apr 11:55 Boot0009-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   69  3. Apr 11:55 Boot000A-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   69  3. Apr 11:55 Boot000B-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   66  3. Apr 11:55 Boot000C-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   66  3. Apr 11:55 Boot000D-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   71  3. Apr 11:55 Boot000E-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   71  3. Apr 11:55 Boot000F-8be4df61-93ca-11d2-aa0d-00e098032b8c
-rw-r--r--. 1 root root   69  3. Apr 11:55 Boot0010-8be4df61-93ca-11d2-aa0d-00e098032b8c

You can download the full output here: efibootmgr.txt and efivars.txt

This was certainly not good. And it also serves as an explanation why saving anything in the setup just fails.

Upstream bug report

Par the suggestion on the developer list I have filed an upstream bug. They told me to try a couple things to nail down the issue.

The output of efibootmgr -v before and after looks about like this:

@@ -1,7 +1,6 @@
 BootCurrent: 000A
 Timeout: 0 seconds
-BootOrder: 004B,000A,0009,0006,0007,0008,000B,000C,000D,000E,000F,0010,0011,0012,0013
+BootOrder: 004C,000A,0009,0006,0007,0008,000B,000C,000D,000E,000F,0010,0011,0012,0013
 Boot0000  Setup        FvFile(721c8b66-426c-4e86-8e99-3457c46ab0b9)
 Boot0001  Boot Menu    FvFile(126a762d-5758-4fca-8531-201a7f57f850)
 Boot0002  Diagnostic Splash Screen     FvFile(a7d8d9a6-6ab0-4aeb-ad9d-163e59a7a380)
@@ -78,4 +77,5 @@
 Boot0049* Fedora       HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
 Boot004A* Fedora       HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
 Boot004B* Fedora       HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
+Boot004C* Fedora       HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\fedora\shim.efi)
 Boot0050* setup        HD(1,GPT,18add345-44b0-44dd-9aa9-feae671b7d2e,0x800,0x64000)/File(\EFI\BOOT\BOOTX64.EFI)

A new boot entry is created and that is set as the first element of the BootOrder. This repeats itself on every boot. At the time of this writing, they are not exactly sure what causes this issue.

This also puts this notice I saw only once before GRUB into a picture:

../../_images/shim.jpg

System BootOrder not found. Initializing defaults. Creating boot entry “Boot0129” with label “Fedora” for the file “EFIfedorashim.efi”. Could not create variable: 9

Perhaps this is related to the error code of 0x9 that one can see on the previous screenshots?

Automatic cleanup

Since I do not want my machine to exhibit the same waking up problem as before, I now regularly clean up the boot entries. For that I wrote a simple script that parses the output of efibootmgr -v and then uses efibootmgr -b XXXX -B to delete that entry. It only does that when 50 entries have accumulated. That script resides in /etc/cron.daily/unused_boot_entries.py and is executed approximately daily.

Download the script: unused_boot_entries.py

So far I receive an email once in a while that it has deleted some entries. My machine runs just fine now.

Conclusion (so far)

There is one article about the sad state of EFI variable storage. I have also read other articles about laptops being completely bricked by deleting variables from the UEFI. All in all it is sad that the software is able to interfere with hardware features in such a way.

My problem here is successfully worked around for now. I hope that at some point it will not create more boot entries. And I hope that over time the UEFI implementations will become more robust and less broken.