1. 29 Jan, 2013 22 commits
    • YOSHIFUJI Hideaki / 吉藤英明's avatar
    • YOSHIFUJI Hideaki / 吉藤英明's avatar
    • David S. Miller's avatar
      Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge · b53c47dd
      David S. Miller authored
      
      
      Included changes:
      - fix recently introduced output behaviour
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b53c47dd
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · f1e7b73a
      David S. Miller authored
      
      
      Bring in the 'net' tree so that we can get some ipv4/ipv6 bug
      fixes that some net-next work will build upon.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1e7b73a
    • Hannes Frederic Sowa's avatar
      ipv6: add anti-spoofing checks for 6to4 and 6rd · 218774dc
      Hannes Frederic Sowa authored
      
      
      This patch adds anti-spoofing checks in sit.c as specified in RFC3964
      section 5.2 for 6to4 and RFC5969 section 12 for 6rd. I left out the
      checks which could easily be implemented with netfilter.
      
      Specifically this patch adds following logic (based loosely on the
      pseudocode in RFC3964 section 5.2):
      
      if prefix (inner_src_v6) == rd6_prefix (2002::/16 is the default)
              and outer_src_v4 != embedded_ipv4 (inner_src_v6)
                      drop
      if prefix (inner_dst_v6) == rd6_prefix (or 2002::/16 is the default)
              and outer_dst_v4 != embedded_ipv4 (inner_dst_v6)
                      drop
      accept
      
      To accomplish the specified security checks proposed by above RFCs,
      it is still necessary to employ uRPF filters with netfilter. These new
      checks only kick in if the employed addresses are within the 2002::/16 or
      another range specified by the 6rd-prefix (which defaults to 2002::/16).
      
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      218774dc
    • Claudiu Manoil's avatar
      gianfar: Pack struct gfar_priv_grp into three cachelines · ee873fda
      Claudiu Manoil authored
      
      
      * remove unused members(!): imask, ievent
      * move space consuming interrupt name strings (int_name_* members) to
      external structures, unessential for the driver's hot path
      * keep high priority hot path data within the first 2 cache lines
      
      This reduces struct gfar_priv_grp from 6 to 3 cache lines.
      (Also fixed checkpatch warnings for the old code, in the process.)
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee873fda
    • Claudiu Manoil's avatar
      gianfar: Cleanup gfar_parse_group() code · 5fedcc14
      Claudiu Manoil authored
      
      
      Factor out redundant code (improve readability, source code size).
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fedcc14
    • Claudiu Manoil's avatar
      gianfar: Optimize struct gfar_priv_tx_q for two cache lines · 0cd3fdea
      Claudiu Manoil authored
      
      
      Resize and regroup structure members to eliminate memory holes and
      to pack the structure into 2 cache lines (from 3).
      tx_ring_size was resized from 4 to 2 bytes and few members were re-grouped
      in order to eliminate byte holes and achieve compactness.
      Where possible, few members were grouped according to their usage and access
      order (i.e. start_xmit vs. clean_tx_ring members), less important members
      were pushed at the end.
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cd3fdea
    • Eric W. Biederman's avatar
      ipv6: Fix inet6_csk_bind_conflict so it builds with user namespaces enabled · 243bb4c6
      Eric W. Biederman authored
      
      
      When attempting to build linux-next with user namespaces enabled I ran
      into this fun build error.
      
        CC      net/ipv6/inet6_connection_sock.o
      .../net/ipv6/inet6_connection_sock.c: In function ‘inet6_csk_bind_conflict’:
      .../net/ipv6/inet6_connection_sock.c:37:12: error: incompatible types when initializing type ‘int’ using
       type ‘kuid_t’
      .../net/ipv6/inet6_connection_sock.c:54:30: error: incompatible type for argument 1 of ‘uid_eq’
      .../include/linux/uidgid.h:48:20: note: expected ‘kuid_t’ but argument is of type ‘int’
      make[3]: *** [net/ipv6/inet6_connection_sock.o] Error 1
      make[2]: *** [net/ipv6] Error 2
      make[2]: *** Waiting for unfinished jobs....
      
      Using kuid_t instead of int to hold the uid fixes this.
      
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      243bb4c6
    • Cong Wang's avatar
      pktgen: support net namespace · 4e58a027
      Cong Wang authored
      
      
      v3: make pktgen_threads list per-namespace
      v2: remove a useless check
      
      This patch add net namespace to pktgen, so that
      we can use pktgen in different namespaces.
      
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <amwang@redhat.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e58a027
    • Frank Li's avatar
      net: fec: add napi support to improve proformance · dc975382
      Frank Li authored
      
      
      Add napi support
      
      Before this patch
      
       iperf -s -i 1
       ------------------------------------------------------------
       Server listening on TCP port 5001
       TCP window size: 85.3 KByte (default)
       ------------------------------------------------------------
       [  4] local 10.192.242.153 port 5001 connected with 10.192.242.138 port 50004
       [ ID] Interval       Transfer     Bandwidth
       [  4]  0.0- 1.0 sec  41.2 MBytes   345 Mbits/sec
       [  4]  1.0- 2.0 sec  43.7 MBytes   367 Mbits/sec
       [  4]  2.0- 3.0 sec  42.8 MBytes   359 Mbits/sec
       [  4]  3.0- 4.0 sec  43.7 MBytes   367 Mbits/sec
       [  4]  4.0- 5.0 sec  42.7 MBytes   359 Mbits/sec
       [  4]  5.0- 6.0 sec  43.8 MBytes   367 Mbits/sec
       [  4]  6.0- 7.0 sec  43.0 MBytes   361 Mbits/sec
      
      After this patch
       [  4]  2.0- 3.0 sec  51.6 MBytes   433 Mbits/sec
       [  4]  3.0- 4.0 sec  51.8 MBytes   435 Mbits/sec
       [  4]  4.0- 5.0 sec  52.2 MBytes   438 Mbits/sec
       [  4]  5.0- 6.0 sec  52.1 MBytes   437 Mbits/sec
       [  4]  6.0- 7.0 sec  52.1 MBytes   437 Mbits/sec
       [  4]  7.0- 8.0 sec  52.3 MBytes   439 Mbits/sec
      Signed-off-by: default avatarFrank Li <Frank.Li@freescale.com>
      Signed-off-by: default avatarFugang Duan <B38611@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc975382
    • Barry Grussling's avatar
      ethoc: Cleanup driver format · 72aa8e1b
      Barry Grussling authored
      
      
      Cleanup the format of ethoc.c to meet network driver style as
      per checkpatch.pl.
      Signed-off-by: default avatarBarry Grussling <barry@grussling.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72aa8e1b
    • David Ward's avatar
      ip_gre: When TOS is inherited, use configured TOS value for non-IP packets · 040468a0
      David Ward authored
      
      
      A GRE tunnel can be configured so that outgoing tunnel packets inherit
      the value of the TOS field from the inner IP header. In doing so, when
      a non-IP packet is transmitted through the tunnel, the TOS field will
      always be set to 0.
      
      Instead, the user should be able to configure a different TOS value as
      the fallback to use for non-IP packets. This is helpful when the non-IP
      packets are all control packets and should be handled by routers outside
      the tunnel as having Internet Control precedence. One example of this is
      the NHRP packets that control a DMVPN-compatible mGRE tunnel; they are
      encapsulated directly by GRE and do not contain an inner IP header.
      
      Under the existing behavior, the IFLA_GRE_TOS parameter must be set to
      '1' for the TOS value to be inherited. Now, only the least significant
      bit of this parameter must be set to '1', and when a non-IP packet is
      sent through the tunnel, the upper 6 bits of this same parameter will be
      copied into the TOS field. (The ECN bits get masked off as before.)
      
      This behavior is backwards-compatible with existing configurations and
      iproute2 versions.
      Signed-off-by: default avatarDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      040468a0
    • Jiri Pirko's avatar
      ipv4: introduce address lifetime · 5c766d64
      Jiri Pirko authored
      
      
      There are some usecase when lifetime of ipv4 addresses might be helpful.
      For example:
      1) initramfs networkmanager uses a DHCP daemon to learn network
      configuration parameters
      2) initramfs networkmanager addresses, routes and DNS configuration
      3) initramfs networkmanager is requested to stop
      4) initramfs networkmanager stops all daemons including dhclient
      5) there are addresses and routes configured but no daemon running. If
      the system doesn't start networkmanager for some reason, addresses and
      routes will be used forever, which violates RFC 2131.
      
      This patch is essentially a backport of ivp6 address lifetime mechanism
      for ipv4 addresses.
      
      Current "ip" tool supports this without any patch (since it does not
      distinguish between ipv4 and ipv6 addresses in this perspective.
      
      Also, this should be back-compatible with all current netlink users.
      Reported-by: default avatarPavel Šimerda <psimerda@redhat.com>
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c766d64
    • David S. Miller's avatar
      Merge branch 'ipfrags' · 5a1dc317
      David S. Miller authored
      Jesper Dangaard Brouer says:
      
      ====================
      This patchset is V2, with some trivial code fixes, which were noticed
      by DaveM. It is still a partly respin of my fragmentation optimization
      patches: http://thread.gmane.org/gmane.linux.network/250914
      
      This is not the complete patchset, from the gmane link above. In this
      patchset, I primarily focus on adjusting cacheline for better SMP/NUMA
      performance.
      
      Once this patchset have been agreed upon, I will continue and respin
      the rest of my patches.
      
      This time around, I have created a frag DoS generator, via the tool
      trafgen (http://netsniff-ng.org/).  To create a stable DoS scenario
      (no longer relying on frame dropping due to disabled flow-control).
      
      Two 10G interfaces are under-test, and uses Ethernet flow-control.  A
      third interface is used for generating the DoS attack (this interface
      is also 10G, but it does not need to be, as 500Kpps DoS is enough).
      
      Test types summary (netperf):
       Test-20G64K     == 2x10G with 65K fragments
       Test-20G3F      == 2x10G with 3x fragments (3*1472 bytes)
       Test-20G64K+DoS == Same as 20G64K with frag DoS
       Test-20G3F+DoS  == Same as 20G3F  with frag DoS
      
      Patch list:
       Patch-01 - net: cacheline adjust struct netns_frags for better frag performance
       Patch-02 - net: cacheline adjust struct inet_frags for better frag performance
       Patch-03 - net: cacheline adjust struct inet_frag_queue
       Patch-04 - net: frag helper functions for mem limit tracking
       Patch-05 - net: use lib/percpu_counter API for fragmentation mem accounting
       Patch-06 - net: frag, move LRU list maintenance outside of rwlock
      
      Performance table summary:
      
       Test-type:  Test-20G64K    Test-20G3F  20G64K+DoS   20G3F+DoS
       ----------  -----------    ----------  ----------   ---------
        net-next:  15114.5 Mbit/s   8954.21     2444.28     3918.01 Mbit/s
        Patch-01:  16075.8 Mbit/s   8976.18     2621.49     4072.79 Mbit/s
        Patch-02:  17806.9 Mbit/s   9280.32     2478.62     4274.59 Mbit/s
        Patch-03:  17317.4 Mbit/s   9308.62     2546.05     4336.59 Mbit/s
        Patch-04:  17635.9 Mbit/s   9256.16     2535.25     4327.63 Mbit/s
        Patch-05:  18027.0 Mbit/s   9918.99     2492.62     3621.68 Mbit/s
        Patch-06:  18486.7 Mbit/s  10723.20     3657.85     4560.64 Mbit/s
      
       I cannot explain the under-DoS regression that patch-05/percpu_counter
       introduces.  But patch-06/LRU-lock corrects the situation again.
      
      Below is a testlab setup description, with links to the trafgen DoS
      packet config used.
      
      Testlab
      =======
      
      Server setup
      ------------
      The machine acting as a server:
       - 2x CPU (E5-2630)
       - Thus a NUMA arch/machine
       - 4x 10Gbit/s ports
       - NICs 2x Intel Dual port 82599 based (driver ixgbe)
      
      Setup:
       - Interfaces uses Ethernet flow control
       - Flush all iptables
       - Remove all iptables related module.
       - Kill irqbalance
       - Pin each 10G NIC port to a *single* CPU each
      
      Pinning can easily be done by command hacks::
      
       for x in /proc/irq/*/eth8*/../smp_affinity_list ; do echo 1 > $x; done
       for x in /proc/irq/*/eth9*/../smp_affinity_list ; do echo 3 > $x; done
       for x in /proc/irq/*/eth31*/../smp_affinity_list; do echo 6 > $x; done
       for x in /proc/irq/*/eth32*/../smp_affinity_list; do echo 8 > $x; done
      
      Notice NUMA setting: The CPU to NIC tying is carefully choosen
      according to the NUMA node setup.  Thus, NICs connected to a PCI-e
      slot that is connected to a physical CPU socket are tied together.
      
      Choosing only a single CPU per NIC (port) is just to ease provoking
      and debugging this performance issue. (In real setups, you can choose
      more CPU, just remember the NUMA node in the equation).
      
      Tools
      -----
      
      Netperf is used, with option -T to ensure CPU binding.
      The netserver processes, are NAPI pinned::
      
       numactl -m0 -c0 netserver
       numactl -m1 -c 1 netserver -p 1337
      
      I now have a frag DoS generator, created via the tool:
        trafgen (see: http://netsniff-ng.org/)
      
      Trafgen packet config file:
       http://people.netfilter.org/hawk/frag_work/trafgen/frag_packet03_small_frag.txf
      
      Notice, I'm using features of trafgen, recently developed by Daniel
      Borkmann, thus you need the latest git tree to use my trafgen packet
      config.
      
       git://github.com/borkmann/netsniff-ng.git
      
      
      
      Command line:
       trafgen --dev eth51 --conf frag_packet03_small_frag.txf -V -k 100 --cpus 2
      
      Tests types
      -----------
      
      Test(20G64K) UDP-64K 2x 10Gbit/s with no DoS traffic:
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
       export SIZE=$((65507)); export TIME=$((20)); export LOG=/tmp/netperf.log ;\
       netperf -p 1337 -H 192.168.31.2 -T7,7 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.31 &\
       netperf         -H 192.168.81.2 -T2,2 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.81 && \
       wait $! && tail -n3 ${LOG}.* && \
       tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992        / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'
      
      Test(20G3F) UDP-3xfrags 2x 10Gbit/s with no DoS traffic:
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
       export SIZE=$((3*1472)); export TIME=$((20)); export LOG=/tmp/netperf.log ;\
       netperf -p 1337 -H 192.168.31.2 -T7,7 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.31 &\
       netperf         -H 192.168.81.2 -T2,2 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.81 && \
       wait $! && tail -n3 ${LOG}.* && \
      tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992        / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'
      
      Awk script for summming results:
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      tail -n3 ${LOG}.{31,81} | awk 'BEGIN{sum=0;} /212992        / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}'
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a1dc317
    • Jesper Dangaard Brouer's avatar
      net: frag, move LRU list maintenance outside of rwlock · 3ef0eb0d
      Jesper Dangaard Brouer authored
      
      
      Updating the fragmentation queues LRU (Least-Recently-Used) list,
      required taking the hash writer lock.  However, the LRU list isn't
      tied to the hash at all, so we can use a separate lock for it.
      Original-idea-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ef0eb0d
    • Jesper Dangaard Brouer's avatar
      net: use lib/percpu_counter API for fragmentation mem accounting · 6d7b857d
      Jesper Dangaard Brouer authored
      
      
      Replace the per network namespace shared atomic "mem" accounting
      variable, in the fragmentation code, with a lib/percpu_counter.
      
      Getting percpu_counter to scale to the fragmentation code usage
      requires some tweaks.
      
      At first view, percpu_counter looks superfast, but it does not
      scale on multi-CPU/NUMA machines, because the default batch size
      is too small, for frag code usage.  Thus, I have adjusted the
      batch size by using __percpu_counter_add() directly, instead of
      percpu_counter_sub() and percpu_counter_add().
      
      The batch size is increased to 130.000, based on the largest 64K
      fragment memory usage.  This does introduce some imprecise
      memory accounting, but its does not need to be strict for this
      use-case.
      
      It is also essential, that the percpu_counter, does not
      share cacheline with other writers, to make this scale.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d7b857d
    • Jesper Dangaard Brouer's avatar
      net: frag helper functions for mem limit tracking · d433673e
      Jesper Dangaard Brouer authored
      
      
      This change is primarily a preparation to ease the extension of memory
      limit tracking.
      
      The change does reduce the number atomic operation, during freeing of
      a frag queue.  This does introduce a some performance improvement, as
      these atomic operations are at the core of the performance problems
      seen on NUMA systems.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d433673e
    • Jesper Dangaard Brouer's avatar
      net: cacheline adjust struct inet_frag_queue · 6e34a8b3
      Jesper Dangaard Brouer authored
      
      
      Fragmentation code cacheline adjusting of struct inet_frag_queue.
      
      Take advantage of the size of struct timer_list, and move all but
      spinlock_t lock, below the timer struct.  On 64-bit 'lru_list',
      'list' and 'refcnt', fits exactly into the next cacheline, and a
      new cacheline starts at 'fragments'.
      
      The netns_frags *net pointer is moved to the end of the struct,
      because its used in a compare, with "next/close-by" elements of
      which this struct is embedded into.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e34a8b3
    • Jesper Dangaard Brouer's avatar
      net: cacheline adjust struct inet_frags for better frag performance · 5f8e1e8b
      Jesper Dangaard Brouer authored
      
      
      The globally shared rwlock, of struct inet_frags, shares
      cacheline with the 'rnd' number, which is used by the hash
      calculations.  Fix this, as this obviously is a bad idea, as
      unnecessary cache-misses will occur when accessing the 'rnd'
      number.
      
      Also small note that, moving function ptr (*match) up in struct,
      is to avoid it lands on the next cacheline (on 64-bit).
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f8e1e8b
    • Jesper Dangaard Brouer's avatar
      net: cacheline adjust struct netns_frags for better frag performance · cd39a789
      Jesper Dangaard Brouer authored
      
      
      This small cacheline adjustment of struct netns_frags improves
      performance significantly for the fragmentation code.
      
      Struct members 'lru_list' and 'mem' are both hot elements, and it
      hurts performance, due to cacheline bouncing at every call point,
      when they share a cacheline.  Also notice, how mem is placed
      together with 'high_thresh' and 'low_thresh', as they are used in
      the compare operations together.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd39a789
    • Felipe Balbi's avatar
      net: ks8851: convert to threaded IRQ · 656a05c8
      Felipe Balbi authored
      
      
      just as it should have been. It also helps
      removing the, now unnecessary, workqueue.
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      656a05c8
  2. 28 Jan, 2013 18 commits
    • YOSHIFUJI Hideaki / 吉藤英明's avatar
      net neigh: Optimize neighbor entry size calculation. · 08433eff
      YOSHIFUJI Hideaki / 吉藤英明 authored
      
      
      When allocating memory for neighbour cache entry, if
      tbl->entry_size is not set, we always calculate
      sizeof(struct neighbour) + tbl->key_len, which is common
      in the same table.
      
      With this change, set tbl->entry_size during the table
      initialization phase, if it was not set, and use it in
      neigh_alloc() and neighbour_priv().
      
      This change also allow us to have both of protocol private
      data and device priate data at tha same time.
      
      Note that the only user of prototcol private is DECnet
      and the only user of device private is ATM CLIP.
      Since those are exclusive, we have not been facing issues
      here.
      Signed-off-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      08433eff
    • bingtian.ly@taobao.com's avatar
      net: avoid to hang up on sending due to sysctl configuration overflow. · cdda8891
      bingtian.ly@taobao.com authored
      
      
          I found if we write a larger than 4GB value to some sysctl
      variables, the sending syscall will hang up forever, because these
      variables are 32 bits, such large values make them overflow to 0 or
      negative.
      
          This patch try to fix overflow or prevent from zero value setup
      of below sysctl variables:
      
      net.core.wmem_default
      net.core.rmem_default
      
      net.core.rmem_max
      net.core.wmem_max
      
      net.ipv4.udp_rmem_min
      net.ipv4.udp_wmem_min
      
      net.ipv4.tcp_wmem
      net.ipv4.tcp_rmem
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarLi Yu <raise.sail@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdda8891
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · fc16e884
      Linus Torvalds authored
      Pull powerpc fixes from Benjamin Herrenschmidt:
       "Whenever you have a chance between two dives, you might want to
        consider pulling my merge branch to pickup a few fixes for 3.8 that
        have been accumulating for the last couple of weeks (I was myself
        travelling then on vacation).
      
        Nothing major, just a handful of powerpc bug fixes that I consider
        worth getting in before 3.8 goes final."
      
      And I'll have everybody know that I'm not diving for several days yet.
      Snif.
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc: Max next_tb to prevent from replaying timer interrupt
        powerpc: kernel/kgdb.c: Fix memory leakage
        powerpc/book3e: Disable interrupt after preempt_schedule_irq
        powerpc/oprofile: Fix error in oprofile power7_marked_instr_event() function
        powerpc/pasemi: Fix crash on reboot
        powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning for ppc32
      fc16e884
    • Jamie Gloudon's avatar
      via-rhine: add 64bit statistics. · f7b5d1b9
      Jamie Gloudon authored
      
      
      Switch to use ndo_get_stats64 to get 64bit statistics.
      Signed-off-by: default avatarJamie Gloudon <jamie.gloudon@gmail.com>
      Tested-by: default avatarJamie Gloudon <jamie.gloudon@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7b5d1b9
    • David J. Choi's avatar
      drivers/net/phy/micrel_phy: Add support for new PHYs · 7ab59dc1
      David J. Choi authored
      
      
      Summary of changes:
      .Newly added phys
      	-KSZ8081/KSZ8091, which has some phy ids.
      	-KSZ8061
      	-KSZ9031, which is Gigabit phy.
      	-KSZ886X, which has a switch function.
      	-KSZ8031, which has a same phy ids with KSZ8021.
      Signed-off-by: default avatarDavid J. Choi <david.choi@micrel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ab59dc1
    • Giuseppe CAVALLARO's avatar
      net: phy: realtek: add rtl8211e driver · ef3d9049
      Giuseppe CAVALLARO authored
      
      
      This patch adds the minimal driver to manage the
      Realtek RTL8211E 10/100/1000 Transceivers.
      Signed-off-by: default avatarGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef3d9049
    • Cong Wang's avatar
      netpoll: use the net namespace of current process instead of init_net · 556e6256
      Cong Wang authored
      
      
      This will allow us to setup netconsole in a different namespace
      rather than where init_net is.
      
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <amwang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      556e6256
    • Cong Wang's avatar
      netpoll: use ipv6_addr_equal() to compare ipv6 addr · faeed828
      Cong Wang authored
      
      
      ipv6_addr_equal() is faster.
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <amwang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faeed828
    • Cong Wang's avatar
      netpoll: add RCU annotation to npinfo field · 5fbee843
      Cong Wang authored
      
      
      dev->npinfo is protected by RCU.
      
      This fixes the following sparse warnings:
      
      net/core/netpoll.c:177:48: error: incompatible types in comparison expression (different address spaces)
      net/core/netpoll.c:200:35: error: incompatible types in comparison expression (different address spaces)
      net/core/netpoll.c:221:35: error: incompatible types in comparison expression (different address spaces)
      net/core/netpoll.c:327:18: error: incompatible types in comparison expression (different address spaces)
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <amwang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fbee843
    • David S. Miller's avatar
      Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next · ce4a600e
      David S. Miller authored
      
      
      John W. Linville says:
      
      ====================
      Included is an NFC pull.  Samuel says:
      
      "It brings the following goodies:
      
      - LLCP socket timestamping (To be used e.g with the recently released nfctool
        application for a more efficient skb timestamping when sniffing).
      - A pretty big pn533 rework from Waldemar, preparing the driver to support
        more flavours of pn533 based devices.
      - HCI changes from Eric in preparation for the microread driver support.
      - Some LLCP memory leak fixes, cleanups and slight improvements.
      - pn544 and nfcwilink move to the devm_kzalloc API.
      - An initial Secure Element (SE) API.
      - An nfc.h license change from the original author, allowing non GPL
        application code to safely include it."
      
      Also included are a pair of mac80211 pulls.  Johannes says:
      
      "We found two bugs in the previous code, so I'm sending you a pull
      request again this soon.
      
      This contains two regulatory bug fixes, some of Thomas's hwsim beacon
      timer work and a documentation fix from Bob."
      
      "Another pull request for mac80211-next. This time, I have a number of
      things, the patches are mostly self-explanatory. There are a few fixes
      from Felix and myself, and random cleanups & improvements. The biggest
      thing is the partial patchset from Marco preparing for mesh powersave."
      
      Additionally, there are a pair of iwlwifi pulls.  Johannes says:
      
      "For iwlwifi-next, I have a few cleanups/improvements as well as a few
      not very important fixes and more preparations for new devices."
      
      "Please pull a few updates for iwlwifi. These are just some cleanups and
      a debug improvement."
      
      On top of that, there is a slew of driver updates.  This includes
      brcmfmac, mwifiex, ath9k, carl9170, and mwl8k as well as a handful
      of others.  The bcma and ssb busses get some attention as well.
      Still, I don't see any big headliners here.
      
      Also included is a pull of the wireless tree, in order to resolve
      some merge conflicts.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ce4a600e
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 8a67b05d
      David S. Miller authored
      
      
      Jeff Kirsher says:
      
      ====================
      This series contains updates to e1000e, ixgbevf, igb and igbvf.
      Majority of the patches are code cleanups of e1000e where code
      is removed (Yeah!).  The other two e1000e patches are fixes.  The
      first is to fix the maximum frame size for 82579 devices.  The second
      fix is to resolve an issue with devices other than 82579 that suffer
      from dropped transactions on platforms with deep C-states when
      jumbo frames are enabled.
      
      The ixgbevf patch is to ensure that the driver fetches the correct,
      refreshed value for link status and speed when the values have changed.
      
      The igb and igbvf patches are a solution to an issue Stefan Assmann
      reported, where when the PF is up and igbvf is loaded, the MAC address
      is not generated using eth_hw_addr_random().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a67b05d
    • Tiejun Chen's avatar
      powerpc: Max next_tb to prevent from replaying timer interrupt · 689dfa89
      Tiejun Chen authored
      
      
      With lazy interrupt, we always call __check_irq_replaysome with
      decrementers_next_tb to check if we need to replay timer interrupt.
      So in hotplug case we also need to set decrementers_next_tb as MAX
      to make sure __check_irq_replay don't replay timer interrupt
      when return as we expect, otherwise we'll trap here infinitely.
      Signed-off-by: default avatarTiejun Chen <tiejun.chen@windriver.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      689dfa89
    • Cong Ding's avatar
      powerpc: kernel/kgdb.c: Fix memory leakage · fefd9e6f
      Cong Ding authored
      
      
      the variable backup_current_thread_info isn't freed before existing the
      function.
      Signed-off-by: default avatarCong Ding <dinggnu@gmail.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fefd9e6f
    • Tiejun Chen's avatar
      powerpc/book3e: Disable interrupt after preempt_schedule_irq · 572177d7
      Tiejun Chen authored
      
      
      In preempt case current arch_local_irq_restore() from
      preempt_schedule_irq() may enable hard interrupt but we really
      should disable interrupts when we return from the interrupt,
      and so that we don't get interrupted after loading SRR0/1.
      Signed-off-by: default avatarTiejun Chen <tiejun.chen@windriver.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      572177d7
    • Carl E. Love's avatar
      powerpc/oprofile: Fix error in oprofile power7_marked_instr_event() function · 46ed7a76
      Carl E. Love authored
      
      
      The calculation for the left shift of the mask OPROFILE_PM_PMCSEL_MSK has an
      error.  The calculation is should be to shift left by (max_cntrs - cntr) times
      the width of the pmsel field width.  However, the #define OPROFILE_MAX_PMC_NUM
      was used instead of OPROFILE_PMSEL_FIELD_WIDTH.  This patch fixes the
      calculation.
      Signed-off-by: default avatarCarl Love <cel@us.ibm.com>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      46ed7a76
    • Steven Rostedt's avatar
      powerpc/pasemi: Fix crash on reboot · 72640d88
      Steven Rostedt authored
      commit f96972f2
      
       "kernel/sys.c: call disable_nonboot_cpus() in
      kernel_restart()"
      
      added a call to disable_nonboot_cpus() on kernel_restart(), which tries
      to shutdown all the CPUs except the first one. The issue with the PA
      Semi, is that it does not support CPU hotplug.
      
      When the call is made to __cpu_down(), it calls the notifiers
      CPU_DOWN_PREPARE, and then tries to take the CPU down.
      
      One of the notifiers to the CPU hotplug code, is the cpufreq. The
      DOWN_PREPARE will call __cpufreq_remove_dev() which calls
      cpufreq_driver->exit. The PA Semi exit handler unmaps regions of I/O
      that is used by an interrupt that goes off constantly
      (system_reset_common, but it goes off during normal system operations
      too). I'm not sure exactly what this interrupt does.
      
      Running a simple function trace, you can see it goes off quite a bit:
      
      # tracer: function
      #
      #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
      #              | |       |          |         |
                <idle>-0     [001]  1558.859363: .pasemi_system_reset_exception <-.system_reset_exception
                <idle>-0     [000]  1558.860112: .pasemi_system_reset_exception <-.system_reset_exception
                <idle>-0     [000]  1558.861109: .pasemi_system_reset_exception <-.system_reset_exception
                <idle>-0     [001]  1558.861361: .pasemi_system_reset_exception <-.system_reset_exception
                <idle>-0     [000]  1558.861437: .pasemi_system_reset_exception <-.system_reset_exception
      
      When the region is unmapped, the system crashes with:
      
      Disabling non-boot CPUs ...
      Error taking CPU1 down: -38
      Unable to handle kernel paging request for data at address 0xd0000800903a0100
      Faulting instruction address: 0xc000000000055fcc
      Oops: Kernel access of bad area, sig: 11 [#1]
      PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient
      Modules linked in: shpchp
      NIP: c000000000055fcc LR: c000000000055fb4 CTR: c0000000000df1fc
      REGS: c0000000012175d0 TRAP: 0300   Not tainted  (3.8.0-rc4-test-dirty)
      MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 24000088  XER: 00000000
      SOFTE: 0
      DAR: d0000800903a0100, DSISR: 42000000
      TASK = c0000000010e9008[0] 'swapper/0' THREAD: c000000001214000 CPU: 0
      GPR00: d0000800903a0000 c000000001217850 c0000000012167e0 0000000000000000
      GPR04: 0000000000000000 0000000000000724 0000000000000724 0000000000000000
      GPR08: 0000000000000000 0000000000000000 0000000000000001 0000000000a70000
      GPR12: 0000000024000080 c00000000fff0000 ffffffffffffffff 000000003ffffae0
      GPR16: ffffffffffffffff 0000000000a21198 0000000000000060 0000000000000000
      GPR20: 00000000008fdd35 0000000000a21258 000000003ffffaf0 0000000000000417
      GPR24: 0000000000a226d0 c000000000000000 0000000000000000 0000000000000000
      GPR28: c00000000138b358 0000000000000000 c000000001144818 d0000800903a0100
      NIP [c000000000055fcc] .set_astate+0x5c/0xa4
      LR [c000000000055fb4] .set_astate+0x44/0xa4
      Call Trace:
      [c000000001217850] [c000000000055fb4] .set_astate+0x44/0xa4 (unreliable)
      [c0000000012178f0] [c00000000005647c] .restore_astate+0x2c/0x34
      [c000000001217980] [c000000000054668] .pasemi_system_reset_exception+0x6c/0x88
      [c000000001217a00] [c000000000019ef0] .system_reset_exception+0x48/0x84
      [c000000001217a80] [c000000000001e40] system_reset_common+0x140/0x180
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      72640d88
    • Oliver Hartkopp's avatar
      can: rework skb reserved data handling · 2bf3440d
      Oliver Hartkopp authored
      
      
      Added accessor and skb_reserve helpers for struct can_skb_priv.
      Removed pointless skb_headroom() check.
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      CC: Marc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bf3440d
    • Linus Torvalds's avatar
      Merge tag 'md-3.8-fixes' of git://neil.brown.name/md · f94d4fe0
      Linus Torvalds authored
      Pull dmraid fix from NeilBrown:
       "Just one fix for md in 3.8
      
        dmraid assess redundancy and replacements slightly inaccurately which
        could lead to some degraded arrays failing to assemble."
      
      * tag 'md-3.8-fixes' of git://neil.brown.name/md:
        DM-RAID: Fix RAID10's check for sufficient redundancy
      f94d4fe0