1. 21 Aug, 2016 1 commit
  2. 07 Aug, 2016 1 commit
    • Lukasz Odzioba's avatar
      EDAC, sb_edac: Fix channel reporting on Knights Landing · c5b48fa7
      Lukasz Odzioba authored
      On Intel Xeon Phi Knights Landing processor family the channels of the
      memory controller have untypical arrangement - MC0 is mapped to CH3,4,5
      and MC1 is mapped to CH0,1,2. This causes the EDAC driver to report the
      channel name incorrectly.
      
      We missed this change earlier, so the code already contains similar
      comment, but the translation function is incorrect.
      
      Without this patch:
        errors in DIMM_A and DIMM_D were reported in DIMM_D
        errors in DIMM_B and DIMM_E were reported in DIMM_E
        errors in DIMM_C and DIMM_F were reported in DIMM_F
      
      Correct this.
      
      Hubert Chrzaniuk:
       - rebased to 4.8
       - comments and code cleanup
      
      Fixes: d0cdf900 ("sb_edac: Add Knights Landing (Xeon Phi gen 2) support")
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: lukasz.anaczkowski@intel.com
      Cc: lukasz.odzioba@intel.com
      Cc: mchehab@kernel.org
      Cc: <stable@vger.kernel.org> # v4.5..
      Link: http://lkml.kernel.org/r/1469231089-22837-1-git-send-email-lukasz.odzioba@intel.comSigned-off-by: default avatarLukasz Odzioba <lukasz.odzioba@intel.com>
      [ Boris: Simplify a bit by removing char mc. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      c5b48fa7
  3. 15 Jul, 2016 1 commit
  4. 25 Jun, 2016 1 commit
  5. 24 Jun, 2016 5 commits
  6. 16 Jun, 2016 1 commit
    • Borislav Petkov's avatar
      EDAC: Correct channel count limit · bba14295
      Borislav Petkov authored
      c44696ff ("EDAC: Remove arbitrary limit on number of channels")
      lifted the arbitrary limit on memory controller channels in EDAC.
      However, the dynamic channel attributes dynamic_csrow_dimm_attr and
      dynamic_csrow_ce_count_attr remained 6.
      
      This wasn't a problem except channels 6 and 7 weren't visible in sysfs
      on machines with more than 6 channels after the conversion to static
      attr groups with
      
        2c1946b6 ("EDAC: Use static attribute groups for managing sysfs entries")
      
       [ without that, we're exploding in edac_create_sysfs_mci_device()
         because we're dereferencing out of the bounds of the
         dynamic_csrow_dimm_attr array. ]
      
      Add attributes for channels 6 and 7 along with a guard for the
      future, should more channels be required and/or to sanity check for
      misconfigured machines.
      
      We still need to check against the number of channels present on the MC
      first, as Thor reported.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reported-by: default avatarHironobu Ishii <ishii.hironobu@jp.fujitsu.com>
      Tested-by: default avatarThor Thayer <tthayer@opensource.altera.com>
      Cc: <stable@vger.kernel.org> # 4.2
      bba14295
  7. 15 Jun, 2016 1 commit
  8. 08 Jun, 2016 2 commits
  9. 03 Jun, 2016 3 commits
    • Tony Luck's avatar
      EDAC, sb_edac: Readd accidentally dropped Broadwell-D support · 665f05e0
      Tony Luck authored
      In commit
      
        2c1ea4c7 ("EDAC, sb_edac: Use cpu family/model in driver detection")
      
      we switched from using PCI ids to determine which platform we are
      running on to using CPU model instead.
      
      I forgot that Broadwell-DE has its own distinct model number different
      from Broadwell-EP or -EX.
      
      Fixing this isn't just adding a line to the array of cpuids - the
      exising code assumed a 1:1 mapping between entries in that array and the
      "enum type" values. Added the type to pci_id_table structure to remove
      this dependency and allows two Broadwell cpu models.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: Aristeu Rozanski <arozansk@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Fixes: 2c1ea4c7 ("EDAC, sb_edac: Use cpu family/model in driver detection")
      Link: http://lkml.kernel.org/r/b3cffe40dec6dfe0235a5d52a504f0ba86a07ce7.1464902605.git.tony.luck@intel.comSigned-off-by: default avatarBorislav Petkov <bp@suse.de>
      665f05e0
    • Nicholas Krause's avatar
      EDAC: Fix workqueues poll period resetting · fbedcaf4
      Nicholas Krause authored
      After the workqueue cleanup, we're registering workqueues based on
      the presence of an ->edac_check function. When that is the case,
      we're setting OP_RUNNING_POLL. But we forgot to check that in
      edac_mc_reset_delay_period(), leading to:
      
        BUG: unable to handle kernel paging request at 0000000000015d10
        IP: [ .. ] queued_spin_lock_slowpath
        PGD 3ffcc8067 PUD 3ffc56067 PMD 0
        Oops: 0002 [#1] SMP
        Modules linked in: ...
        CPU: 1 PID: 2792 Comm: edactest Not tainted 4.6.0-dirty #1
        Hardware name: HP ProLiant MicroServer, BIOS O41     10/01/2013
        Stack:
        Call Trace:
          ? _raw_spin_lock_irqsave
          ? lock_timer_base.isra.34
          ? del_timer
          ? try_to_grab_pending
          ? mod_delayed_work_on
          ? edac_mc_reset_delay_period
          ? edac_set_poll_msec
          ? param_attr_store
          ? module_attr_store
          ? kernfs_fop_write
          ? __vfs_write
          ? __vfs_read
          ? __alloc_fd
          ? vfs_write
          ? SyS_write
          ? entry_SYSCALL_64_fastpath
        Code:
        RIP  [ .. ] queued_spin_lock_slowpath
         RSP <>
        CR2: 0000000000015d10
        ---[ end trace 3f286bc71cca15d1 ]---
        Kernel panic - not syncing: Fatal exception
      
      Fix it.
      Signed-off-by: default avatarNicholas Krause <xerofoify@gmail.com>
      Cc: <stable@vger.kernel.org> # 4.5
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1463697958-13406-1-git-send-email-xerofoify@gmail.com
      [ Rewrite commit message. ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      fbedcaf4
    • Tony Luck's avatar
      EDAC, sb_edac: Fix rank lookup on Broadwell · c7103f65
      Tony Luck authored
      Broadwell made a small change to the rank target register moving the
      target rank ID field up from bits 16:19 to bits 20:23.
      
      Also found that the offset field grew by one bit in the IVY_BRIDGE to
      HASWELL transition, so fix the RIR_OFFSET() macro too.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: stable@vger.kernel.org # v3.19+
      Cc: Aristeu Rozanski <arozansk@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/2943fb819b1f7e396681165db9c12bb3df0e0b16.1464735623.git.tony.luck@intel.comSigned-off-by: default avatarBorislav Petkov <bp@suse.de>
      c7103f65
  10. 12 May, 2016 1 commit
  11. 09 May, 2016 1 commit
    • Borislav Petkov's avatar
      EDAC, amd64_edac: Drop pci_register_driver() use · 3f37a36b
      Borislav Petkov authored
      - remove homegrown instances counting.
      - take F3 PCI device from amd_nb caching instead of F2 which was used with the
      PCI core.
      
      With those changes, the driver doesn't need to register a PCI driver and
      relies on the northbridges caching which we do anyway on AMD.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Yazen Ghannam <yazen.ghannam@amd.com>
      3f37a36b
  12. 06 May, 2016 1 commit
  13. 02 May, 2016 1 commit
    • Tony Luck's avatar
      EDAC, sb_edac: Use cpu family/model in driver detection · 2c1ea4c7
      Tony Luck authored
      Instead of picking a random PCI ID from the dozen or so we need to
      access, just use x86_match_cpu() to pick based on CPU model number. The
      choosing of PCI devices has been problematic in the past, see
      
        11249e73 ("sb_edac: Fix detection on SNB machines")
      
      which fixed problems introduced by
      
        d0585cd8 ("sb_edac: Claim a different PCI device").
      
      This is especially ugly if future hardware might not even have
      EDAC-relevant registers in PCI config space and we would still be
      required to choose some "random" PCI devices to scan for just so our
      driver loads.
      
      Is this cleaner/clearer? It deletes much more code than it adds. Only
      tested on Broadwell. The driver loads/unloads and loads again. Still
      decodes errors too.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Suggested-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      2c1ea4c7
  14. 29 Apr, 2016 2 commits
  15. 27 Apr, 2016 1 commit
  16. 23 Apr, 2016 5 commits
  17. 22 Apr, 2016 2 commits
  18. 18 Apr, 2016 1 commit
  19. 07 Apr, 2016 1 commit
  20. 02 Apr, 2016 3 commits
  21. 29 Mar, 2016 5 commits