1. 30 Sep, 2014 5 commits
    • Dwight Engen's avatar
      vio: fix reuse of vio_dring slot · d0aedcd4
      Dwight Engen authored
      
      
      vio_dring_avail() will allow use of every dring entry, but when the last
      entry is allocated then dr->prod == dr->cons which is indistinguishable from
      the ring empty condition. This causes the next allocation to reuse an entry.
      When this happens in sunvdc, the server side vds driver begins nack'ing the
      messages and ends up resetting the ldc channel. This problem does not effect
      sunvnet since it checks for < 2.
      
      The fix here is to just never allocate the very last dring slot so that full
      and empty are not the same condition. The request start path was changed to
      check for the ring being full a bit earlier, and to stop the blk_queue if
      there is no space left. The blk_queue will be restarted once the ring is
      only half full again. The number of ring entries was increased to 512 which
      matches the sunvnet and Solaris vdc drivers, and greatly reduces the
      frequency of hitting the ring full condition and the associated blk_queue
      stop/starting. The checks in sunvent were adjusted to account for
      vio_dring_avail() returning 1 less.
      
      Orabug: 19441666
      OraBZ: 14983
      Signed-off-by: default avatarDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0aedcd4
    • Dwight Engen's avatar
      sunvdc: limit each sg segment to a page · 5eed69ff
      Dwight Engen authored
      
      
      ldc_map_sg() could fail its check that the number of pages referred to
      by the sg scatterlist was <= the number of cookies.
      
      This fixes the issue by doing a similar thing to the xen-blkfront driver,
      ensuring that the scatterlist will only ever contain a segment count <=
      port->ring_cookies, and each segment will be page aligned, and <= page
      size. This ensures that the scatterlist is always mappable.
      
      Orabug: 19347817
      OraBZ: 15945
      Signed-off-by: default avatarDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5eed69ff
    • Allen Pais's avatar
      sunvdc: compute vdisk geometry from capacity · de5b73f0
      Allen Pais authored
      
      
      The LDom diskserver doesn't return reliable geometry data. In addition,
      the types for all fields in the vio_disk_geom are u16, which were being
      truncated in the cast into the u8's of the Linux struct hd_geometry.
      
      Modify vdc_getgeo() to compute the geometry from the disk's capacity in a
      manner consistent with xen-blkfront::blkif_getgeo().
      Signed-off-by: default avatarDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de5b73f0
    • Allen Pais's avatar
      sunvdc: add cdrom and v1.1 protocol support · 9bce2182
      Allen Pais authored
      
      
      Interpret the media type from v1.1 protocol to support CDROM/DVD.
      
      For v1.0 protocol, a disk's size continues to be calculated from the
      geometry returned by the vdisk server. The geometry returned by the server
      can be less than the actual number of sectors available in the backing
      image/device due to the rounding in the division used to compute the
      geometry in the vdisk server.
      
      In v1.1 protocol a disk's actual size in sectors is returned during the
      handshake. Use this size when v1.1 protocol is negotiated. Since this size
      will always be larger than the former geometry computed size, disks created
      under v1.0 will be forwards compatible to v1.1, but not vice versa.
      Signed-off-by: default avatarDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9bce2182
    • David L Stevens's avatar
      sparc: VIO protocol version 1.6 · 163a4e74
      David L Stevens authored
      
      
      Add VIO protocol version 1.6 interfaces.
      Signed-off-by: default avatarDavid L Stevens <david.stevens@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      163a4e74
  2. 26 Sep, 2014 1 commit
  3. 16 Sep, 2014 5 commits
    • Sowmini Varadhan's avatar
      sparc64: Move request_irq() from ldc_bind() to ldc_alloc() · c21c4ab0
      Sowmini Varadhan authored
      
      
      The request_irq() needs to be done from ldc_alloc()
      to avoid the following (caught by lockdep)
      
       [00000000004a0738] __might_sleep+0xf8/0x120
       [000000000058bea4] kmem_cache_alloc_trace+0x184/0x2c0
       [00000000004faf80] request_threaded_irq+0x80/0x160
       [000000000044f71c] ldc_bind+0x7c/0x220
       [0000000000452454] vio_port_up+0x54/0xe0
       [00000000101f6778] probe_disk+0x38/0x220 [sunvdc]
       [00000000101f6b8c] vdc_port_probe+0x22c/0x300 [sunvdc]
       [0000000000451a88] vio_device_probe+0x48/0x60
       [000000000074c56c] really_probe+0x6c/0x300
       [000000000074c83c] driver_probe_device+0x3c/0xa0
       [000000000074c92c] __driver_attach+0x8c/0xa0
       [000000000074a6ec] bus_for_each_dev+0x6c/0xa0
       [000000000074c1dc] driver_attach+0x1c/0x40
       [000000000074b0fc] bus_add_driver+0xbc/0x280
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c21c4ab0
    • bob picco's avatar
      sparc64: T5 PMU · 05aa1651
      bob picco authored
      
      
      The T5 (niagara5) has different PCR related HV fast trap values and a new
      HV API Group. This patch utilizes these and shares when possible with niagara4.
      
      We use the same sparc_pmu niagara4_pmu. Should there be new effort to
      obtain the MCU perf statistics then this would have to be changed.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05aa1651
    • bob picco's avatar
      sparc64: mem boot option correction · 7c21d533
      bob picco authored
      
      
      The "mem" boot option can result in many unexpected consequences. This patch
      attempts to prevent boot hangs which have been experienced on T4-4 and T5-8.
      Basically the boot loader allocates vmlinuz and initrd higher in available
      OBP physical memory. For example, on a 2Tb T5-8 it isn't possible to boot
      with mem=20G.
      
      The patch utilizes memblock to avoid reserved regions and trim memory which
      is only free. Other improvements are possible for a multi-node machine.
      
      This is a snippet of the boot log with mem=20G on T5-8 with the patch applied:
      MEMBLOCK configuration:	<- before memory reduction
       memory size = 0x1ffad6ce000 reserved size = 0xa1adf44
       memory.cnt  = 0xb
       memory[0x0]    [0x00000030400000-0x00003fdde47fff], 0x3fada48000 bytes
       memory[0x1]    [0x00003fdde4e000-0x00003fdde4ffff], 0x2000 bytes
       memory[0x2]    [0x00080000000000-0x00083fffffffff], 0x4000000000 bytes
       memory[0x3]    [0x00100000000000-0x00103fffffffff], 0x4000000000 bytes
       memory[0x4]    [0x00180000000000-0x00183fffffffff], 0x4000000000 bytes
       memory[0x5]    [0x00200000000000-0x00203fffffffff], 0x4000000000 bytes
       memory[0x6]    [0x00280000000000-0x00283fffffffff], 0x4000000000 bytes
       memory[0x7]    [0x00300000000000-0x00303fffffffff], 0x4000000000 bytes
       memory[0x8]    [0x00380000000000-0x00383fffc71fff], 0x3fffc72000 bytes
       memory[0x9]    [0x00383fffc92000-0x00383fffca1fff], 0x10000 bytes
       memory[0xa]    [0x00383fffcb4000-0x00383fffcb5fff], 0x2000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
       reserved[0x1]  [0x00380004000000-0x0038000d02f74a], 0x902f74b bytes
      ...
      MEMBLOCK configuration:	<- after reduction of memory
       memory size = 0x50a1adf44 reserved size = 0xa1adf44
       memory.cnt  = 0x4
       memory[0x0]    [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
       memory[0x1]    [0x00380004000000-0x0038050d01d74a], 0x50901d74b bytes
       memory[0x2]    [0x00383fffc92000-0x00383fffca1fff], 0x10000 bytes
       memory[0x3]    [0x00383fffcb4000-0x00383fffcb5fff], 0x2000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
       reserved[0x1]  [0x00380004000000-0x0038000d02f74a], 0x902f74b bytes
      ...
      Early memory node ranges
        node   7: [mem 0x380000000000-0x38000117dfff]
        node   7: [mem 0x380004000000-0x380f0d01bfff]
        node   7: [mem 0x383fffc92000-0x383fffca1fff]
        node   7: [mem 0x383fffcb4000-0x383fffcb5fff]
      Could not find start_pfn for node 0
      Could not find start_pfn for node 1
      Could not find start_pfn for node 2
      Could not find start_pfn for node 3
      Could not find start_pfn for node 4
      Could not find start_pfn for node 5
      Could not find start_pfn for node 6
      .
      
      The patch was tested on T4-1, T5-8 and Jalap?no.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c21d533
    • bob picco's avatar
      sparc64: find_node adjustment · 3dee9df5
      bob picco authored
      
      
      We have seen an issue with guest boot into LDOM that causes early boot failures
      because of no matching rules for node identitity of the memory. I analyzed this
      on my T4 and concluded there might not be a solution. I saw the issue in
      mainline too when booting into the control/primary domain - with guests
      configured.  Note, this could be a firmware bug on some older machines.
      
      I'll provide a full explanation of the issues below. Should we not find a
      matching BEST latency group for a real address (RA) then we will assume node 0.
      On the T4-2 here with the information provided I can't see an alternative.
      
      Technically the LDOM shown below should match the MBLOCK to the
      favorable latency group. However other factors must be considered too. Were
      the memory controllers configured "fine" grained interleave or "coarse"
      grain interleaved -  T4. Also should a "group" MD node be considered a NUMA
      node?
      
      There has to be at least one Machine Description (MD) "group" and hence one
      NUMA node. The group can have one or more latency groups (lg) - more than one
      memory controller. The current code chooses the smallest latency as the most
      favorable per group. The latency and lg information is in MLGROUP below.
      MBLOCK is the base and size of the RAs for the machine as fetched from OBP
      /memory "available" property. My machine has one MBLOCK but more would be
      possible - with holes?
      
      For a T4-2 the following information has been gathered:
      with LDOM guest
      MEMBLOCK configuration:
       memory size = 0x27f870000
       memory.cnt  = 0x3
       memory[0x0]    [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes
       memory[0x1]    [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes
       memory[0x2]    [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes
       reserved[0x1]  [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes
      MBLOCK[0]: base[20000000] size[280000000] offset[0]
      (note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values)
      (note: (RA + offset) & mask = val is the formula to detect a match for the
      memory controller. should there be no match for find_node node, a return
      value of -1 resulted for the node - BAD)
      
      There is one group. It has these forward links
      MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000]
      MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000]
      NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8])
      (note: "val" is the best lg's (smallest latency) "match")
      
      no LDOM guest - bare metal
      MEMBLOCK configuration:
       memory size = 0xfdf2d0000
       memory.cnt  = 0x3
       memory[0x0]    [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes
       memory[0x1]    [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes
       memory[0x2]    [0x00000fff766000-0x00000fff771fff], 0xc000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00000020800000-0x00000021a04580], 0x1204581 bytes
       reserved[0x1]  [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes
      MBLOCK[0]: base[20000000] size[fe0000000] offset[0]
      
      there are two groups
      group node[16d5]
      MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000]
      MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000]
      NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8])
      group node[171d]
      MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000]
      MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000]
      NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8])
      (note: for this two "group" bare metal machine, 1/2 memory is in group one's
      lg and 1/2 memory is in group two's lg).
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3dee9df5
    • bob picco's avatar
      sparc64: sun4v TLB error power off events · 4ccb9272
      bob picco authored
      
      
      We've witnessed a few TLB events causing the machine to power off because
      of prom_halt. In one case it was some nfs related area during rmmod. Another
      was an mmapper of /dev/mem. A more recent one is an ITLB issue with
      a bad pagesize which could be a hardware bug. Bugs happen but we should
      attempt to not power off the machine and/or hang it when possible.
      
      This is a DTLB error from an mmapper of /dev/mem:
      [root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
      SUN4V-DTLB: TPC<0xfffff80100903e6c>
      SUN4V-DTLB: O7[fffff801081979d0]
      SUN4V-DTLB: O7<0xfffff801081979d0>
      SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
      .
      
      This is recent mainline for ITLB:
      [ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc>
      [ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
      [ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8>
      [ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
      .
      
      Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
      prom_halt() and drop us to OF command prompt "ok". This isn't the case for
      LDOMs and the machine powers off.
      
      For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
      a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
      of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
      one("1"). Otherwise, for %tl > 1,  we proceed eventually to die_if_kernel().
      
      The logic of this patch was partially inspired by David Miller's feedback.
      
      Power off of large sparc64 machines is painful. Plus die_if_kernel provides
      more context. A reset sequence isn't a brief period on large sparc64 but
      better than power-off/power-on sequence.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ccb9272
  4. 10 Sep, 2014 1 commit
  5. 09 Sep, 2014 12 commits
  6. 08 Sep, 2014 5 commits
    • Linus Torvalds's avatar
      Merge branch 'for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 8c68face
      Linus Torvalds authored
      Pull ext4 bugfix from Ted Ts'o.
      
      [ Hmm.  It's possible we should make kfree() aware of error pointers,
        and use IS_ERR_OR_NULL rather than a NULL check.  But in the meantime
        this is obviously the right fix.  - Linus ]
      
      * 'for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: avoid trying to kfree an ERR_PTR pointer
      8c68face
    • Linus Torvalds's avatar
      Merge branch 'for-3.17' of git://linux-nfs.org/~bfields/linux · 861b7102
      Linus Torvalds authored
      Pull nfsd bugfixes from Bruce Fields:
       "A couple minor nfsd bugfixes"
      
      * 'for-3.17' of git://linux-nfs.org/~bfields/linux:
        lockd: fix rpcbind crash on lockd startup failure
        nfsd4: fix rd_dircount enforcement
      861b7102
    • J. Bruce Fields's avatar
      lockd: fix rpcbind crash on lockd startup failure · 7c17705e
      J. Bruce Fields authored
      
      
      Nikita Yuschenko reported that booting a kernel with init=/bin/sh and
      then nfs mounting without portmap or rpcbind running using a busybox
      mount resulted in:
      
        # mount -t nfs 10.30.130.21:/opt /mnt
        svc: failed to register lockdv1 RPC service (errno 111).
        lockd_up: makesock failed, error=-111
        Unable to handle kernel paging request for data at address 0x00000030
        Faulting instruction address: 0xc055e65c
        Oops: Kernel access of bad area, sig: 11 [#1]
        MPC85xx CDS
        Modules linked in:
        CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
        task: cf29cea0 ti: cf35c000 task.ti: cf35c000
        NIP: c055e65c LR: c0566490 CTR: c055e648
        REGS: cf35dad0 TRAP: 0300   Not tainted  (3.10.44.cge)
        MSR: 00029000 <CE,EE,ME>  CR: 22442488  XER: 20000000
        DEAR: 00000030, ESR: 00000000
      
        GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086
        00000000
        GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000
        10090ae8
        GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000
        bfa46ef0
        GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000
        cf0ded80
        NIP [c055e65c] call_start+0x14/0x34
        LR [c0566490] __rpc_execute+0x70/0x250
        Call Trace:
        [cf35db80] [00000080] 0x80 (unreliable)
        [cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
        [cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
        [cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
        [cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
        [cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
        [cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
        [cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
        [cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
        [cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
        [cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
        [cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
        [cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
        [cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
        [cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
        [cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
        [cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
        [cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
        [cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c
      
      The addition of svc_shutdown_net() resulted in two calls to
      svc_rpcb_cleanup(); the second is no longer necessary and crashes when
      it calls rpcb_register_call with clnt=NULL.
      Reported-by: default avatarNikita Yushchenko <nyushchenko@dev.rtsoft.ru>
      Fixes: 679b033d
      
       "lockd: ensure we tear down any live sockets when socket creation fails during lockd_up"
      Cc: stable@vger.kernel.org
      Acked-by: default avatarJeff Layton <jlayton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      7c17705e
    • J. Bruce Fields's avatar
      nfsd4: fix rd_dircount enforcement · aee37764
      J. Bruce Fields authored
      Commit 3b299709 "nfsd4: enforce rd_dircount" totally misunderstood
      rd_dircount; it refers to total non-attribute bytes returned, not number
      of directory entries returned.
      
      Bring the code into agreement with RFC 3530 section 14.2.24.
      
      Cc: stable@vger.kernel.org
      Fixes: 3b299709
      
       "nfsd4: enforce rd_dircount"
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      aee37764
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 35af2561
      Linus Torvalds authored
      Pull s390 fixes from Martin Schwidefsky:
       "A bug fix for the vdso code, the loadparm for booting from SCSI is
        added and the access permissions for the dasd module parameters are
        corrected"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/vdso: remove NULL pointer check from clock_gettime
        s390/ipl: Add missing SCSI loadparm attributes to /sys/firmware
        s390/dasd: Make module parameter visible in sysfs
      35af2561
  7. 07 Sep, 2014 11 commits