1. 09 Jul, 2013 1 commit
    • Daniel Borkmann's avatar
      net: sctp: confirm route during forward progress · 8c2f414a
      Daniel Borkmann authored
      This fix has been proposed originally by Vlad Yasevich. He says:
        When SCTP makes forward progress (receives a SACK that acks new chunks,
        renegs, or answeres 0-window probes) or when HB-ACK arrives, mark
        the route as confirmed so we don't unnecessarily send NUD probes.
      Having a simple SCTP client/server that exchange data chunks every 1sec,
      without this patch ARP requests are sent periodically every 40-60sec.
      With this fix applied, an ARP request is only done once right at the
      "session" beginning. Also, when clearing the related ARP cache entry
      manually during the session, a new request is correctly done. I have
      only "backported" this to net-next and tested that it works, so full
      credit goes to Vlad.
      Signed-off-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  2. 02 Jul, 2013 2 commits
    • Daniel Borkmann's avatar
      net: sctp: get rid of SCTP_DBG_TSNS entirely · e02010ad
      Daniel Borkmann authored
      After having reworked the debugging framework, Neil and Vlad agreed to
      get rid of the leftover SCTP_DBG_TSNS code for a couple of reasons:
      We can use systemtap scripts to investigate these things, we now have
      pr_debug() helpers that make life easier, and if we really need anything
      else besides those tools, we will be forced to come up with something
      better than we have there. Therefore, get rid of this ifdef debugging
      code entirely for now.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Neil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Daniel Borkmann's avatar
      net: sctp: rework debugging framework to use pr_debug and friends · bb33381d
      Daniel Borkmann authored
      We should get rid of all own SCTP debug printk macros and use the ones
      that the kernel offers anyway instead. This makes the code more readable
      and conform to the kernel code, and offers all the features of dynamic
      debbuging that pr_debug() et al has, such as only turning on/off portions
      of debug messages at runtime through debugfs. The runtime cost of having
      CONFIG_DYNAMIC_DEBUG enabled, but none of the debug statements printing,
      is negligible [1]. If kernel debugging is completly turned off, then these
      statements will also compile into "empty" functions.
      While we're at it, we also need to change the Kconfig option as it /now/
      only refers to the ifdef'ed code portions in outqueue.c that enable further
      debugging/tracing of SCTP transaction fields. Also, since SCTP_ASSERT code
      was enabled with this Kconfig option and has now been removed, we
      transform those code parts into WARNs resp. where appropriate BUG_ONs so
      that those bugs can be more easily detected as probably not many people
      have SCTP debugging permanently turned on.
      To turn on all SCTP debugging, the following steps are needed:
       # mount -t debugfs none /sys/kernel/debug
       # echo -n 'module sctp +p' > /sys/kernel/debug/dynamic_debug/control
      This can be done more fine-grained on a per file, per line basis and others
      as described in [2].
       [1] https://www.kernel.org/doc/ols/2009/ols2009-pages-39-46.pdf
       [2] Documentation/dynamic-debug-howto.txt
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  3. 13 Jun, 2013 1 commit
    • Neil Horman's avatar
      sctp: fully initialize sctp_outq in sctp_outq_init · c5c7774d
      Neil Horman authored
      In commit 2f94aabd
      (refactor sctp_outq_teardown to insure proper re-initalization)
      we modified sctp_outq_teardown to use sctp_outq_init to fully re-initalize the
      outq structure.  Steve West recently asked me why I removed the q->error = 0
      initalization from sctp_outq_teardown.  I did so because I was operating under
      the impression that sctp_outq_init would properly initalize that value for us,
      but it doesn't.  sctp_outq_init operates under the assumption that the outq
      struct is all 0's (as it is when called from sctp_association_init), but using
      it in __sctp_outq_teardown violates that assumption. We should do a memset in
      sctp_outq_init to ensure that the entire structure is in a known state there
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatar"West, Steve (NSN - US/Fort Worth)" <steve.west@nsn.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: netdev@vger.kernel.org
      CC: davem@davemloft.net
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  4. 17 Apr, 2013 2 commits
  5. 04 Feb, 2013 1 commit
  6. 17 Jan, 2013 1 commit
  7. 03 Dec, 2012 1 commit
    • Michele Baldessari's avatar
      sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call · 196d6759
      Michele Baldessari authored
      The current SCTP stack is lacking a mechanism to have per association
      statistics. This is an implementation modeled after OpenSolaris'
      Userspace part will follow on lksctp if/when there is a general ACK on
      - Move ipackets++ before q->immediate.func() for consistency reasons
      - Move sctp_max_rto() at the end of sctp_transport_update_rto() to avoid
        returning bogus RTO values
      - return asoc->rto_min when max_obs_rto value has not changed
      - Increase ictrlchunks in sctp_assoc_bh_rcv() as well
      - Move ipackets++ to sctp_inq_push()
      - return 0 when no rto updates took place since the last call
      - Implement partial retrieval of stat struct to cope for future expansion
      - Kill the rtxpackets counter as it cannot be precise anyway
      - Rename outseqtsns to outofseqtsns to make it clearer that these are out
        of sequence unexpected TSNs
      - Move asoc->ipackets++ under a lock to avoid potential miscounts
      - Fold asoc->opackets++ into the already existing asoc check
      - Kill unneeded (q->asoc) test when increasing rtxchunks
      - Do not count octrlchunks if sending failed (SCTP_XMIT_OK != 0)
      - Don't count SHUTDOWNs as SACKs
      - Move SCTP_GET_ASSOC_STATS to the private space API
      - Adjust the len check in sctp_getsockopt_assoc_stats() to allow for
        future struct growth
      - Move association statistics in their own struct
      - Update idupchunks when we send a SACK with dup TSNs
      - return min_rto in max_rto when RTO has not changed. Also return the
        transport when max_rto last changed.
      Signed-off: Michele Baldessari <michele@acksyn.org>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  8. 04 Oct, 2012 1 commit
  9. 04 Sep, 2012 1 commit
  10. 15 Aug, 2012 1 commit
  11. 22 Jul, 2012 1 commit
    • Neil Horman's avatar
      sctp: Implement quick failover draft from tsvwg · 5aa93bcf
      Neil Horman authored
      I've seen several attempts recently made to do quick failover of sctp transports
      by reducing various retransmit timers and counters.  While its possible to
      implement a faster failover on multihomed sctp associations, its not
      particularly robust, in that it can lead to unneeded retransmits, as well as
      false connection failures due to intermittent latency on a network.
      Instead, lets implement the new ietf quick failover draft found here:
      This will let the sctp stack identify transports that have had a small number of
      errors, and avoid using them quickly until their reliability can be
      re-established.  I've tested this out on two virt guests connected via multiple
      isolated virt networks and believe its in compliance with the above draft and
      works well.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Sridhar Samudrala <sri@us.ibm.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: linux-sctp@vger.kernel.org
      CC: joe@perches.com
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  12. 15 Apr, 2012 1 commit
  13. 20 Dec, 2011 1 commit
    • Thomas Graf's avatar
      sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd · a76c0adf
      Thomas Graf authored
      When checking whether a DATA chunk fits into the estimated rwnd a
      full sizeof(struct sk_buff) is added to the needed chunk size. This
      quickly exhausts the available rwnd space and leads to packets being
      sent which are much below the PMTU limit. This can lead to much worse
      The reason for this behaviour was to avoid putting too much memory
      pressure on the receiver. The concept is not completely irational
      because a Linux receiver does in fact clone an skb for each DATA chunk
      delivered. However, Linux also reserves half the available socket
      buffer space for data structures therefore usage of it is already
      accounted for.
      When proposing to change this the last time it was noted that this
      behaviour was introduced to solve a performance issue caused by rwnd
      overusage in combination with small DATA chunks.
      Trying to reproduce this I found that with the sk_buff overhead removed,
      the performance would improve significantly unless socket buffer limits
      are increased.
      The following numbers have been gathered using a patched iperf
      supporting SCTP over a live 1 Gbit ethernet network. The -l option
      was used to limit DATA chunk sizes. The numbers listed are based on
      the average of 3 test runs each. Default values have been used for
      Size    Unpatched     No Overhead
         4    15.2 Kbit [!]   12.2 Mbit [!]
         8    35.8 Kbit [!]   26.0 Mbit [!]
        16    95.5 Kbit [!]   54.4 Mbit [!]
        32   106.7 Mbit      102.3 Mbit
        64   189.2 Mbit      188.3 Mbit
       128   331.2 Mbit      334.8 Mbit
       256   537.7 Mbit      536.0 Mbit
       512   766.9 Mbit      766.6 Mbit
      1024   810.1 Mbit      808.6 Mbit
      Signed-off-by: default avatarThomas Graf <tgraf@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  14. 24 Aug, 2011 1 commit
  15. 07 Jul, 2011 1 commit
    • Thomas Graf's avatar
      sctp: Enforce retransmission limit during shutdown · f8d96052
      Thomas Graf authored
      When initiating a graceful shutdown while having data chunks
      on the retransmission queue with a peer which is in zero
      window mode the shutdown is never completed because the
      retransmission error count is reset periodically by the
      following two rules:
       - Do not timeout association while doing zero window probe.
       - Reset overall error count when a heartbeat request has
         been acknowledged.
      The graceful shutdown will wait for all outstanding TSN to
      be acknowledged before sending the SHUTDOWN request. This
      never happens due to the peer's zero window not acknowledging
      the continuously retransmitted data chunks. Although the
      error counter is incremented for each failed retransmission,
      the receiving of the SACK announcing the zero window clears
      the error count again immediately. Also heartbeat requests
      continue to be sent periodically. The peer acknowledges these
      requests causing the error counter to be reset as well.
      This patch changes behaviour to only reset the overall error
      counter for the above rules while not in shutdown. After
      reaching the maximum number of retransmission attempts, the
      T5 shutdown guard timer is scheduled to give the receiver
      some additional time to recover. The timer is stopped as soon
      as the receiver acknowledges any data.
      The issue can be easily reproduced by establishing a sctp
      association over the loopback device, constantly queueing
      data at the sender while not reading any at the receiver.
      Wait for the window to reach zero, then initiate a shutdown
      by killing both processes simultaneously. The association
      will never be freed and the chunks on the retransmission
      queue will be retransmitted indefinitely.
      Signed-off-by: default avatarThomas Graf <tgraf@infradead.org>
      Acked-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  16. 02 Jun, 2011 1 commit
  17. 20 Apr, 2011 2 commits
  18. 19 Apr, 2011 1 commit
  19. 31 Mar, 2011 1 commit
  20. 07 Mar, 2011 1 commit
  21. 26 Aug, 2010 1 commit
  22. 18 May, 2010 1 commit
  23. 30 Apr, 2010 7 commits
  24. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      The script does the followings.
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
      The conversion was done in the following steps.
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
      6. percpu.h was updated not to include slab.h.
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
  25. 29 Nov, 2009 2 commits
    • Joe Perches's avatar
      net: Move && and || to end of previous line · f64f9e71
      Joe Perches authored
      Not including net/atm/
      Compiled tested x86 allyesconfig only
      Added a > 80 column line or two, which I ignored.
      Existing checkpatch plaints willfully, cheerfully ignored.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Andrei Pelinescu-Onciul's avatar
      sctp: on T3_RTX retransmit all the in-flight chunks · 5fdd4bae
      Andrei Pelinescu-Onciul authored
      When retransmitting due to T3 timeout, retransmit all the
      in-flight chunks for the corresponding  transport/path, including
      chunks sent less then 1 rto ago.
      This is the correct behaviour according to rfc4960 section 6.3.3
      E3 and
      "Note: Any DATA chunks that were sent to the address for which the
       T3-rtx timer expired but did not fit in one MTU (rule E3 above)
       should be marked for retransmission and sent as soon as cwnd
       allows (normally, when a SACK arrives). ".
      This fixes problems when more then one path is present and the T3
      retransmission of the first chunk that timeouts stops the T3 timer
      for the initial active path, leaving all the other in-flight
      chunks waiting forever or until a new chunk is transmitted on the
      same path and timeouts (and this will happen only if the cwnd
      allows sending new chunks, but since cwnd was dropped to MTU by
      the timeout => it will wait until the first heartbeat).
      Example: 10 packets in flight, sent at 0.1 s intervals on the
      primary path. The primary path is down and the first packet
      timeouts. The first packet is retransmitted on another path, the
      T3 timer for the primary path is stopped and cwnd is set to MTU.
      All the other 9 in-flight packets will not be retransmitted
      (unless more new packets are sent on the primary path which depend
      on cwnd allowing it, and even in this case the 9 packets will be
      retransmitted only after a new packet timeouts which even in the
      best case would be more then RTO).
      This commit reverts d0ce9291 and
      also removes the now unused transport->last_rto, introduced in
      p.s  The problem is not only when multiple paths are there.  It
      can happen in a single homed environment.  If the application
      stops sending data, it possible to have a hung association.
      Signed-off-by: default avatarAndrei Pelinescu-Onciul <andrei@iptel.org>
      Signed-off-by: default avatarVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  26. 23 Nov, 2009 2 commits
  27. 04 Sep, 2009 1 commit
  28. 13 Mar, 2009 1 commit
  29. 02 Mar, 2009 1 commit