Skip to content
  • Johannes Berg's avatar
    iwlwifi: mvm: fix time event command handling race · e3722822
    Johannes Berg authored
    
    
    Occasionally, we would run into this warning:
    
      iwlwifi 0000:02:00.0: U iwl_mvm_protect_session extend 0x2601: only 200 ms left
      iwlwifi 0000:02:00.0: U iwl_mvm_remove_time_event Removing TE 0x2601
      iwlwifi 0000:02:00.0: I iwl_pcie_enqueue_hcmd Sending command TIME_EVENT_CMD (#29), seq: 0x0925, 60 bytes at 37[5]:9
      iwlwifi 0000:02:00.0: U iwl_pcie_send_hcmd_sync Attempting to send sync command TIME_EVENT_CMD
      iwlwifi 0000:02:00.0: U iwl_pcie_send_hcmd_sync Setting HCMD_ACTIVE for command TIME_EVENT_CMD
      iwlwifi 0000:02:00.0: I iwl_pcie_enqueue_hcmd Sending command TIME_EVENT_CMD (#29), seq: 0x0926, 60 bytes at 38[6]:9
      iwlwifi 0000:02:00.0: U iwl_mvm_time_event_response TIME_EVENT_CMD response - UID = 0x2601
      iwlwifi 0000:02:00.0: I iwl_pcie_hcmd_complete Clearing HCMD_ACTIVE for command TIME_EVENT_CMD
      iwlwifi 0000:02:00.0: U iwl_mvm_rx_time_event_notif Time event notification - UID = 0x2701 action 1
      wlan0: associate with 00:0a:b8:55:a8:30 (try 2/3)
      ------------[ cut here ]------------
      WARNING: at drivers/net/wireless/iwlwifi/mvm/time-event.c:269 iwl_mvm_time_event_send_add+0x163/0x1a0 [iwlmvm]()
      Modules linked in: [...]
      Call Trace:
       [<c1046e42>] warn_slowpath_common+0x72/0xa0
       [<c1046e92>] warn_slowpath_null+0x22/0x30
       [<f8cad913>] iwl_mvm_time_event_send_add+0x163/0x1a0 [iwlmvm]
       [<f8cadead>] iwl_mvm_protect_session+0xcd/0x1c0 [iwlmvm]
       [<f8ca2087>] iwl_mvm_mac_mgd_prepare_tx+0x67/0xa0 [iwlmvm]
       [<f882a130>] ieee80211_sta_work+0x8f0/0x1070 [mac80211]
    
    The reason is a problem with asynchronous vs. synchronous
    commands, what happens here is the following:
     * TE 0x2601 is removed, the TIME_EVENT_CMD for that is async
     * a new TE (will be 0x2701) is created, the TIME_EVENT_CMD
       for that is sync and also uses a notification wait for the
       response (to avoid another race condition)
     * the response for the TE 0x2601 removal comes from the
       firmware, and is handled by the notification wait handler
       that's really waiting for the second response, but can't
       tell the difference, we therefore see the message
       "TIME_EVENT_CMD response - UID = 0x2601" instead of
       "TIME_EVENT_CMD response - UID = 0x2701".
    
    Fix this issue by making the TE removal synchronous as well,
    this means that we wait for the response to that command
    first, before there's any chance of sending a new one.
    
    Also, to detect such issues more easily in the future, add
    a warning to the notification handler that detects them.
    
    Reviewed-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
    Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
    e3722822