===============
== seedy.xyz ==
===============
Sierra's little corner of the internet

The Curse of the Intel Wireless AC Chipset

bugs intel iwlwifi linux networking wireless

Intel wireless drivers on linux have been dodgy for years. I’m currently experiencing a bug having to do with beacon timings and dropped connections on 802.11ac connections (I refuse to call it Wi-Fi 5). It has the most interesting and depressing linux kernel bugzilla thread I have ever seen. In fact, it’s so fascinating to me that I’ll go through it again and write up my thoughts based on the issues I’ve faced, and what solutions I’ve tried (that have all failed).

As of writing this, I’m running Linux Mint 20.1 Ulyssia on linux 5.8.0-53-generic, and the particular Intel wireless chipset I’m running is Intel Corporation Wireless-AC 9261 (rev 29).

The Issue

The thread starts out as such:

Bug 203709 - iwlwifi: 8260: frequently disconnects since Linux 5.1 “No beacon heard and the time event is over already” - WIFI-25906

Reported: 2019-05-25 23:06 UTC by Denis Lisov
Modified: 2021-05-19 01:17 UTC

Kernel Version: 5.1.5
Tree: Mainline

Hardware: Thinkpad P50 with Intel Corporation Wireless 8260 [8086:24f3]
Software: Gentoo Linux with vanilla kernel, NetworkManager using wpa_supplicant

In Linux 5.0 the WiFi connection to my home network is rather stable. In Linux 5.1 (5.1.5 at the moment) it’s unstable with the following symptoms.

…and then proceeds to post various debug outputs, with the main events being a very low ping interspersed with many dropped packets, and then this interesting output from dmesg that loops again and again.

[  420.822273] wlp4s0: Connection to AP ec:43:f6:07:90:84 lost
[  423.420415] wlp4s0: authenticate with ec:43:f6:07:90:84
[  423.431798] wlp4s0: send auth to ec:43:f6:07:90:84 (try 1/3)
[  423.436016] wlp4s0: authenticated
[  423.438864] wlp4s0: associate with ec:43:f6:07:90:84 (try 1/3)
[  423.542285] wlp4s0: associate with ec:43:f6:07:90:84 (try 2/3)
[  423.550083] wlp4s0: RX AssocResp from ec:43:f6:07:90:84 (capab=0xc11 status=0 aid=3)
[  423.552862] wlp4s0: associated
[  424.045586] iwlwifi 0000:04:00.0: No beacon heard and the time event is over already...
[  424.045622] wlp4s0: Connection to AP ec:43:f6:07:90:84 lost

This is the start of a thread of dead ends, similar but conflicting firmware issues, lots of “it’s fixed with this patch! oh wait after a few days I guess it wasn’t” that continues to this day. This bug thread is approaching its two year anniversary, and it hasn’t been necrobumped at all, it’s been continuous discussion, new tests based on scripts sent into the thread, new users finding this problem and reporting their data, and not one fix that has been proposed has worked.

Sit back, relax, and prepare for a fascinating tale of what happens when Intel pushes bad firmware into the Linux kernel that nobody can study and find out what went wrong.

It’s a bigger bug than we thought

The first indication that this isn’t just a regression in linux51 comes from kurmikon who posted on 2019-08-12 20:29:36 UTC, 79 days after the initial post:

It’s happening also on my Intel 3165, even on Linux 5.0.

This leads the thread to follow a red herring of a similar problem with an AMD chip that got patched in linux52, however after compiling that patch, nothing changed. In fact, that patch had already been backported to Arch’s mainline kernel. This outcome will come again and again, dealing with different distributions and different packages with different patches and versions conflicting with each other while making things much more confusing. Thus is the nature of the decentralized landscape of Linux distributions.

On 2019-08-23, 90 days after the initial post, Denis Lisov (the original poster) tested the current Linux 5.2.9 build with the earlier patch, and is still getting the same dropped packets as before, further proof that this AMD patch is not affecting the problem.

More people post different Intel chips, distros, and system configurations, and by this point it’s clear that this is a bug that affects all Intel wireless chipsets with wireless ac support.

Release after release, no change

On 2019-09-18, 116 days after the initial post, Denis posts his findings testing out the new Linux 5.3.0 release. Still no change in outcome.

On 2019-11-11, 170 days after the initial post, another user posts their trace and has the same issue on Linux 5.3.10.

On 2019-11-26, 185 days after the inital post, Denis posts his trace of the new Linux 5.4 release and confirms no change.

On 2019-12-27, 216 days after the initial post, another user says there’s no change in the Linux 5.5-rc3 release candidate.

On 2020-01-21, 241 days after the initial post, user mall is the first to confirm that the bug exists on Intel 802.11ax (Wi-Fi 6) chipsets, this one being the AX200. This post saved me from buying a third wireless card, as I was going to try out this chipset to see if the issue still persisted on a newer card. Thanks for saving me 80 bucks.

Things get much messier

On 2020-01-29, 249 days after the initial post, Denis randomly stops being able to consistently reproduce the errors on the previous kernel versions that he traced and verified over the past year, which starts the next big red herring: going down the rabbit hole of misconfigured access points.

I tried what the next chunk of responders tried, messing with beacon timings, setting a strict channel, transmission power, everything, and like most of the people in the thread I ended up with a placebo effect that lasted a few weeks before the issues started persisting again. The next few months include people posting more traces of the same issue, trying reducing beacon intervals to no avail, and wondering whether patches are affecting the issue or not.

On 2020-02-19, 270 days after the initial post, lukas.redlinger finds in his wpa_supplicant logs that there were frequent channel switches going on. This starts some investigation into whether or not having automatic channel switching enabled on the access point is causing the issue, but ultimately it isn’t. Some start disabling IPv6, which I have tried today and am still getting the issue as I write this.

Finally: some commonality

On 2020-03-10, 290 days after the initial post, user WGH posted this comment:

My 7260 on ThinkPad T440 also started to go into disconnecting rage once in a while ever since I upgraded from 4.19.x to 5.5.x kernel.

This post solidified the third known fact, even if still somewhat disputed by people confusing the issue with similar firmware bugs. If you’re keeping score at home, so far we know that:

  1. Intel AC and AX chipsets routinely disconnect from 5GHz networks and spit out beacon timing errors.
  2. These same chipsets work on Windows without issue.
  3. This bug didn’t exist originally, and the last known working kernel version is sometime in the 4.19 series.

To this day, there hasn’t been a kernel version newer than 4.19 that hasn’t had this bug. I can’t regress to this kernel because it’s over 2 years old at this point and was superceded before the Ubuntu LTS version that my distribution is based off of even released.

This is very sad

This is where the thread gets depressing. A year after the thread was posted, people who had contributed earlier are starting to get annoyed. More and more people are posting about how they can’t stay connected for more than 10 minutes at a time, that the bug still hasn’t been identified after an entire year, that the kernel version that the bug started on has long since lost its support. They’re getting desperate, trying the fixes that didn’t work before again and again with each passing kernel update, hoping that their connections won’t drop. I know, I still do this from time to time. The only sure-fire way to get your connection back is to disable and re-enable networking when things stall.

On 2020-07-28, 1 year 64 days after the initial post, user dagthree7 notes that the beacon timing errors happen mostly when there’s high network traffic. I’ve noticed this in my own testing. It happens when I have multiple things transmitting data at once, whether it be just on this machine, or on multiple devices on the network. When the beacon errors get particularly bad, it makes nearby devices lose network connection across the board. It’s particularly frustrating when it happens and I try to look something up on my phone but it locks up trying to find a network. The resulting close range disruptive interference from whatever garbled signal is being sent out by the wireless adapter is literally jamming radio signals on both 2.5 and 5GHz bands.

On 2020-07-31, 1 year 66 days after the initial post, user Naruto Windy posted my second favorite comment in this entire thread:

Can we just take a break and admire that this bug is here for a more than a year.
Thank you Intel.

The posts keep coming up with similar solutions that don’t work for very long to this day. Patches, router configurations, antenna setups, disabling power save modes, changing encryption protocols, disabling IPv6 all didn’t work.

The Solution

On 2020-08-10, 1 year 78 days after the initial post, kurmikon comes back and posts my favorite post in the entire thread:

I resolved this issue blacklisting iwlwifi and buying a Chinese USB wifi adapter.

Thank you Intel.

The most recent post, the 239th post in this thread, was made yesterday, 2021-05-19, marking 1 year 359 days of this bug being alive. The current mainline kernel version is 5.10, 9 major revisions from when the original post was made. The best fix we have so far is to use a shitty USB wireless adapter.