1. 22 Nov, 2011 3 commits
  2. 21 Nov, 2011 2 commits
  3. 19 Nov, 2011 1 commit
    • David Rientjes's avatar
      hugetlb: remove dummy definitions of HPAGE_MASK and HPAGE_SIZE · a5c86e98
      David Rientjes authored
      Dummy, non-zero definitions for HPAGE_MASK and HPAGE_SIZE were added in
      51c6f666 ("mm: ZAP_BLOCK causes redundant work") to avoid a divide
      by zero in generic kernel code.
      That code has since been removed, but probably should never have been
      added in the first place: we don't want HPAGE_SIZE to act like PAGE_SIZE
      for code that is working with hugepages, for example, when the
      dependency on CONFIG_HUGETLB_PAGE has not been fulfilled.
      Because hugepage size can differ from architecture to architecture, each
      is required to have their own definitions for both HPAGE_MASK and
      HPAGE_SIZE.  This is always done in arch/*/include/asm/page.h.
      So, just remove the dummy and dangerous definitions since they are no
      longer needed and reveals the correct dependencies.  Tested on
      architectures using the definitions with allyesconfig: x86 (even with
      thp), hppa, mips, powerpc, s390, sh3, sh4, sparc, and sparc64, and with
      defconfig on ia64.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  4. 18 Nov, 2011 2 commits
  5. 17 Nov, 2011 3 commits
    • Eric Dumazet's avatar
      net: use jump_label to shortcut RPS if not setup · adc9300e
      Eric Dumazet authored
      Most machines dont use RPS/RFS, and pay a fair amount of instructions in
      netif_receive_skb() / netif_rx() / get_rps_cpu() just to discover
      RPS/RFS is not setup.
      Add a jump_label named rps_needed.
      If no device rps_map or global rps_sock_flow_table is setup,
      netif_receive_skb() / netif_rx() do a single instruction instead of many
      ones, including conditional jumps.
      jmp +0    (if CONFIG_JUMP_LABEL=y)
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      CC: Tom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Rafael J. Wysocki's avatar
      PM Sleep: Do not extend wakeup paths to devices with ignore_children set · 8b258cc8
      Rafael J. Wysocki authored
      Commit 4ca46ff3 (PM / Sleep: Mark
      devices involved in wakeup signaling during suspend) introduced
      the power.wakeup_path field in struct dev_pm_info to mark devices
      whose children are enabled to wake up the system from sleep states,
      so that power domains containing the parents that provide their
      children with wakeup power and/or relay their wakeup signals are not
      turned off.  Unfortunately, that introduced a PM regression on SH7372
      whose power consumption in the system "memory sleep" state increased
      as a result of it, because it prevented the power domain containing
      the I2C controller from being turned off when some children of that
      controller were enabled to wake up the system, although the
      controller was not necessary for them to signal wakeup.
      To fix this issue use the observation that devices whose
      power.ignore_children flag is set for runtime PM should be treated
      analogously during system suspend.  Namely, they shouldn't be
      included in wakeup paths going through their children.  Since the
      SH7372 I2C controller's power.ignore_children flag is set, doing so
      will restore the previous behavior of that SOC.
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
    • Alexander Graf's avatar
      Revert "KVM: PPC: Add support for explicit HIOR setting" · bb75c627
      Alexander Graf authored
      This reverts commit a15bd354.
      It exceeded the padding on the SREGS struct, rendering the ABI
      Signed-off-by: default avatarAvi Kivity <avi@redhat.com>
  6. 16 Nov, 2011 17 commits
  7. 15 Nov, 2011 2 commits
  8. 14 Nov, 2011 1 commit
    • Eric Dumazet's avatar
      net: introduce build_skb() · b2b5ce9d
      Eric Dumazet authored
      One of the thing we discussed during netdev 2011 conference was the idea
      to change some network drivers to allocate/populate their skb at RX
      completion time, right before feeding the skb to network stack.
      In old days, we allocated skbs when populating the RX ring.
      This means bringing into cpu cache sk_buff and skb_shared_info cache
      lines (since we clear/initialize them), then 'queue' skb->data to NIC.
      By the time NIC fills a frame in skb->data buffer and host can process
      it, cpu probably threw away the cache lines from its caches, because lot
      of things happened between the allocation and final use.
      So the deal would be to allocate only the data buffer for the NIC to
      populate its RX ring buffer. And use build_skb() at RX completion to
      attach a data buffer (now filled with an ethernet frame) to a new skb,
      initialize the skb_shared_info portion, and give the hot skb to network
      build_skb() is the function to allocate an skb, caller providing the
      data buffer that should be attached to it. Drivers are expected to call
      skb_reserve() right after build_skb() to adjust skb->data to the
      Ethernet frame (usually skipping NET_SKB_PAD and NET_IP_ALIGN, but some
      drivers might add a hardware provided alignment)
      Data provided to build_skb() MUST have been allocated by a prior
      kmalloc() call, with enough room to add SKB_DATA_ALIGN(sizeof(struct
      skb_shared_info)) bytes at the end of the data without corrupting
      incoming frame.
      data = kmalloc(NET_SKB_PAD + NET_IP_ALIGN + 1536 +
                     SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
      skb = build_skb(data);
      if (!skb) {
      } else {
      	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      CC: Eilon Greenstein <eilong@broadcom.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      CC: Tom Herbert <therbert@google.com>
      CC: Jamal Hadi Salim <hadi@mojatatu.com>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      CC: Thomas Graf <tgraf@infradead.org>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  9. 13 Nov, 2011 5 commits
    • Maciej Żenczykowski's avatar
      net-netlink: Add a new attribute to expose TCLASS values via netlink · 06236ac3
      Maciej Żenczykowski authored
      commit 3ceca749 added a TOS attribute.
      Unfortunately TOS and TCLASS are both present in a dual-stack v6 socket,
      furthermore they can have different values.  As such one cannot in a
      sane way expose both through a single attribute.
      Signed-off-by: default avatarMaciej Żenczyowski <maze@google.com>
      CC: Murali Raja <muralira@google.com>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric Dumazet's avatar
      neigh: new unresolved queue limits · 8b5c171b
      Eric Dumazet authored
      Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
      > From: David Miller <davem@davemloft.net>
      > Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
      > > From: Eric Dumazet <eric.dumazet@gmail.com>
      > > Date: Wed, 09 Nov 2011 12:14:09 +0100
      > >
      > >> unres_qlen is the number of frames we are able to queue per unresolved
      > >> neighbour. Its default value (3) was never changed and is responsible
      > >> for strange drops, especially if IP fragments are used, or multiple
      > >> sessions start in parallel. Even a single tcp flow can hit this limit.
      > >  ...
      > >
      > > Ok, I've applied this, let's see what happens :-)
      > Early answer, build fails.
      > Please test build this patch with DECNET enabled and resubmit.  The
      > decnet neigh layer still refers to the removed ->queue_len member.
      > Thanks.
      Ouch, this was fixed on one machine yesterday, but not the other one I
      used this morning, sorry.
      [PATCH V5 net-next] neigh: new unresolved queue limits
      unres_qlen is the number of frames we are able to queue per unresolved
      neighbour. Its default value (3) was never changed and is responsible
      for strange drops, especially if IP fragments are used, or multiple
      sessions start in parallel. Even a single tcp flow can hit this limit.
      $ arp -d ; ping -c 2 -s 8000
      PING ( 8000(8028) bytes of data.
      8008 bytes from icmp_seq=2 ttl=64 time=0.322 ms
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • alex.bluesman.smirnov@gmail.com's avatar
      6LoWPAN: add fragmentation support · 719269af
      alex.bluesman.smirnov@gmail.com authored
      This patch adds support for frame fragmentation.
      Signed-off-by: default avatarAlexander Smirnov <alex.bluesman.smirnov@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric Dumazet's avatar
      ipv6: reduce percpu needs for icmpv6msg mibs · 2a24444f
      Eric Dumazet authored
      Reading /proc/net/snmp6 on a machine with a lot of cpus is very
      expensive (can be ~88000 us).
      This is because ICMPV6MSG MIB uses 4096 bytes per cpu, and folding
      values for all possible cpus can read 16 Mbytes of memory (32MBytes on
      non x86 arches)
      ICMP messages are not considered as fast path on a typical server, and
      eventually few cpus handle them anyway. We can afford an atomic
      operation instead of using percpu data.
      This saves 4096 bytes per cpu and per network namespace.
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Jiri Pirko's avatar
      net: introduce ethernet teaming device · 3d249d4c
      Jiri Pirko authored
      This patch introduces new network device called team. It supposes to be
      very fast, simple, userspace-driven alternative to existing bonding
      Userspace library called libteam with couple of demo apps is available
      Note it's still in its dipers atm.
      team<->libteam use generic netlink for communication. That and rtnl
      suppose to be the only way to configure team device, no sysfs etc.
      Python binding of libteam was recently introduced.
      Daemon providing arpmon/miimon active-backup functionality will be
      introduced shortly. All what's necessary is already implemented in
      kernel team driver.
      	- check ndo_ndo_vlan_rx_[add/kill]_vid functions before calling
      	- use dev_kfree_skb_any() instead of dev_kfree_skb()
      	- transmit and receive functions are not checked in hot paths.
      	  That also resolves memory leak on transmit when no port is
      	- changed couple of _rcu calls to non _rcu ones in non-readers
      	- team_change_mtu() uses team->lock while travesing though port
      	- mac address changes are moved completely to jurisdiction of
      	  userspace daemon. This way the daemon can do FOM1, FOM2 and
      	  possibly other weird things with mac addresses.
      	  Only round-robin mode sets up all ports to bond's address then
      	- Extended Kconfig text
      	- remove redundant synchronize_rcu from __team_change_mode()
      	- revert "set and clear of mode_ops happens per pointer, not per
      	- extend comment of function __team_change_mode()
      	- team_change_mtu() uses rcu version of list traversal to unwind
      	- set and clear of mode_ops happens per pointer, not per byte
      	- port hashlist changed to be embedded into team structure
      	- error branch in team_port_enter() does cleanup now
      	- fixed rtln->rtnl
      	- modes are made as modules. Makes team more modular and
      	- several commenters' nitpicks found on v1 were fixed
      	- several other bugs were fixed.
      	- note I ignored Eric's comment about roundrobin port selector
      	  as Eric's way may be easily implemented as another mode (mode
      	  "random") in future.
      Signed-off-by: default avatarJiri Pirko <jpirko@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  10. 11 Nov, 2011 4 commits