Skip to content
Snippets Groups Projects
  1. Oct 22, 2011
    • Eric W. Biederman's avatar
      bonding: Add a forgetten sysfs_attr_init on class_attr_bonding_masters · 01718e36
      Eric W. Biederman authored
      
      When I made class_attr_bonding_matters per network namespace and dynamically
      allocated I overlooked the need for calling sysfs_attr_init.  Oops.
      
      This fixes the following lockdep splat:
      
      [    5.749651] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
      [    5.749655] bonding: MII link monitoring set to 100 ms
      [    5.749676] BUG: key f49a831c not in .data!
      [    5.749677] ------------[ cut here ]------------
      [    5.749752] WARNING: at kernel/lockdep.c:2897 lockdep_init_map+0x1c3/0x460()
      [    5.749809] Hardware name: ProLiant BL460c G1
      [    5.749862] Modules linked in: bonding(+)
      [    5.749978] Pid: 3177, comm: modprobe Not tainted 3.1.0-rc9-02177-gf2d1a4e-dirty #1157
      [    5.750066] Call Trace:
      [    5.750120]  [<c1352c2f>] ? printk+0x18/0x21
      [    5.750176]  [<c103112d>] warn_slowpath_common+0x6d/0xa0
      [    5.750231]  [<c1060133>] ? lockdep_init_map+0x1c3/0x460
      [    5.750287]  [<c1060133>] ? lockdep_init_map+0x1c3/0x460
      [    5.750342]  [<c103117d>] warn_slowpath_null+0x1d/0x20
      [    5.750398]  [<c1060133>] lockdep_init_map+0x1c3/0x460
      [    5.750453]  [<c1355ddd>] ? _raw_spin_unlock+0x1d/0x20
      [    5.750510]  [<c11255c8>] ? sysfs_new_dirent+0x68/0x110
      [    5.750565]  [<c1124d4b>] sysfs_add_file_mode+0x8b/0xe0
      [    5.750621]  [<c1124db3>] sysfs_add_file+0x13/0x20
      [    5.750675]  [<c1124e7c>] sysfs_create_file+0x1c/0x20
      [    5.750737]  [<c1208f09>] class_create_file+0x19/0x20
      [    5.750794]  [<c12c186f>] netdev_class_create_file+0xf/0x20
      [    5.750853]  [<f85deaf4>] bond_create_sysfs+0x44/0x90 [bonding]
      [    5.750911]  [<f8410947>] ? bond_create_proc_dir+0x1e/0x3e [bonding]
      [    5.750970]  [<f841007e>] bond_net_init+0x7e/0x87 [bonding]
      [    5.751026]  [<f8410000>] ? 0xf840ffff
      [    5.751080]  [<c12abc7a>] ops_init.clone.4+0xba/0x100
      [    5.751135]  [<c12abdb2>] ? register_pernet_subsys+0x12/0x30
      [    5.751191]  [<c12abd03>] register_pernet_operations.clone.3+0x43/0x80
      [    5.751249]  [<c12abdb9>] register_pernet_subsys+0x19/0x30
      [    5.751306]  [<f84108b9>] bonding_init+0x832/0x8a2 [bonding]
      [    5.751363]  [<c10011f0>] do_one_initcall+0x30/0x160
      [    5.751420]  [<f8410087>] ? bond_net_init+0x87/0x87 [bonding]
      [    5.751477]  [<c106d5cf>] sys_init_module+0xef/0x1890
      [    5.751533]  [<c1356490>] sysenter_do_call+0x12/0x36
      [    5.751588] ---[ end trace 89f492d83a7f5006 ]---
      
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Tested-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01718e36
  2. Oct 19, 2011
  3. Oct 18, 2011
  4. Oct 03, 2011
    • Andy Gospodarek's avatar
      bonding: properly stop queuing work when requested · a0db2dad
      Andy Gospodarek authored
      
      During a test where a pair of bonding interfaces using ARP monitoring
      were both brought up and torn down (with an rmmod) repeatedly, a panic
      in the timer code was noticed.  I tracked this down and determined that
      any of the bonding functions that ran as workqueue handlers and requeued
      more work might not properly exit when the module was removed.
      
      There was a flag protected by the bond lock called kill_timers that is
      set when the interface goes down or the module is removed, but many of
      the functions that monitor link status now unlock the bond lock to take
      rtnl first.  There is a chance that another CPU running the rmmod could
      get the lock and set kill_timers after the first check has passed.
      
      This patch does not allow any function to queue work that will make
      itself run unless kill_timers is not set.  I also noticed while doing
      this work that bond_resend_igmp_join_requests did not have a check for
      kill_timers, so I added the needed call there as well.
      
      Signed-off-by: default avatarAndy Gospodarek <andy@greyhouse.net>
      Reported-by: default avatarLiang Zheng <lzheng@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0db2dad
  5. Sep 15, 2011
    • Jiri Pirko's avatar
      net: consolidate and fix ethtool_ops->get_settings calling · 4bc71cb9
      Jiri Pirko authored
      
      This patch does several things:
      - introduces __ethtool_get_settings which is called from ethtool code and
        from drivers as well. Put ASSERT_RTNL there.
      - dev_ethtool_get_settings() is replaced by __ethtool_get_settings()
      - changes calling in drivers so rtnl locking is respected. In
        iboe_get_rate was previously ->get_settings() called unlocked. This
        fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same
        problem. Also fixed by calling __dev_get_by_index() instead of
        dev_get_by_index() and holding rtnl_lock for both calls.
      - introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create()
        so bnx2fc_if_create() and fcoe_if_create() are called locked as they
        are from other places.
      - use __ethtool_get_settings() in bonding code
      
      Signed-off-by: default avatarJiri Pirko <jpirko@redhat.com>
      
      v2->v3:
      	-removed dev_ethtool_get_settings()
      	-added ASSERT_RTNL into __ethtool_get_settings()
      	-prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock
      	 around it and __ethtool_get_settings() call
      v1->v2:
              add missing export_symbol
      Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits]
      Acked-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bc71cb9
  6. Aug 17, 2011
  7. Aug 11, 2011
  8. Jul 27, 2011
    • Andy Gospodarek's avatar
      bonding: reduce noise during init · b2730f4f
      Andy Gospodarek authored
      
      On Tue, Jul 26, 2011 at 05:40:27PM -0700, Joe Perches wrote:
      > On Tue, 2011-07-26 at 17:37 -0700, Jay Vosburgh wrote:
      > > Joe Perches <joe@perches.com> wrote:
      > > >I'd prefer you don't separate the format string
      > > >into multiple pieces.
      > > Why not?  To me, it looks easier to read split into sections
      > > that don't wrap lines.
      >
      > Harder to grep for a dmesg and the
      > defect rate of these split formats is
      > typically higher than single strings
      > because of bad spacing between string
      > segments.
      >
      
      I noticed that you took some time back in late 2009 to 'consolidate' the
      split format-strings present in the bonding driver at the time and I've
      decided I'm fine to leave them the way they are.  The main point of my
      patch was to change the output and I would like to get that included.
      Here is my updated patch...
      
      Subject: [PATCH net-next-2.6 v2] bonding: reduce noise during init
      
      Many are using sysfs to configure bonding rather than module options, so
      there is no need for bonding to throw this warning in normal cases.
      
      Keep the message around when debugging is enabled as it might be useful
      for someone desperate enough to enable debugging, but eliminate it
      otherwise.
      
      Signed-off-by: default avatarAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2730f4f
    • Andy Gospodarek's avatar
      bonding: fix string comparison errors · f4bb2e9c
      Andy Gospodarek authored
      
      When a bond contains a device where one name is the subset of another
      (eth1 and eth10, for example), one cannot properly set the primary
      device or the currently active device.
      
      This was reported and based on work by Takuma Umeya.  I also verified
      the problem and tested that this fix resolves it.
      
      V2: A few did not like the the current code or my changes, so I
      refactored bonding_store_primary and bonding_store_active_slave to be a
      bit cleaner, dropped the use of strnicmp since we did not really need
      the comparison to be case insensitive, and formatted the input string
      from sysfs so a comparison to IFNAMSIZ could be used.
      
      I also discovered an error in bonding_store_active_slave that would
      modify bond->primary_slave rather than bond->curr_active_slave before
      forcing the bonding driver to choose a new active slave.
      
      V3: Actually sending the proper patch....
      
      Signed-off-by: default avatarAndy Gospodarek <andy@greyhouse.net>
      Reported-by: default avatarTakuma Umeya <tumeya@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4bb2e9c
    • Neil Horman's avatar
      net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared · 550fd08c
      Neil Horman authored
      
      After the last patch, We are left in a state in which only drivers calling
      ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real
      hardware call ether_setup for their net_devices and don't hold any state in
      their skbs.  There are a handful of drivers that violate this assumption of
      course, and need to be fixed up.  This patch identifies those drivers, and marks
      them as not being able to support the safe transmission of skbs by clearning the
      IFF_TX_SKB_SHARING flag in priv_flags
      
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Karsten Keil <isdn@linux-pingi.de>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Krzysztof Halasa <khc@pm.waw.pl>
      CC: "John W. Linville" <linville@tuxdriver.com>
      CC: Greg Kroah-Hartman <gregkh@suse.de>
      CC: Marcel Holtmann <marcel@holtmann.org>
      CC: Johannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      550fd08c
  9. Jul 21, 2011
  10. Jul 14, 2011
    • Michał Mirosław's avatar
      net: remove NETIF_F_ALL_TX_OFFLOADS · 62f2a3a4
      Michał Mirosław authored
      
      There is no software fallback implemented for SCTP or FCoE checksumming,
      and so it should not be passed on by software devices like bridge or bonding.
      
      For VLAN devices, this is different. First, the driver for underlying device
      should be prepared to get offloaded packets even when the feature is disabled
      (especially if it advertises it in vlan_features). Second, devices under
      VLANs do not get replaced without tearing down the VLAN first.
      
      This fixes a mess I accidentally introduced while converting bonding to
      ndo_fix_features.
      
      NETIF_F_SOFT_FEATURES are removed from BOND_VLAN_FEATURES because they
      are unused as of commit 712ae51a.
      
      Signed-off-by: default avatarMichał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62f2a3a4
  11. Jun 23, 2011
  12. Jun 21, 2011
  13. Jun 19, 2011
  14. Jun 13, 2011
  15. Jun 11, 2011
  16. Jun 09, 2011
  17. Jun 05, 2011
    • Neil Horman's avatar
      bonding: reset queue mapping prior to transmission to physical device (v5) · 374eeb5a
      Neil Horman authored
      
      The bonding driver is multiqueue enabled, in which each queue represents a slave
      to enable optional steering of output frames to given slaves against the default
      output policy.  However, it needs to reset the skb->queue_mapping prior to
      queuing to the physical device or the physical slave (if it is multiqueue) could
      wind up transmitting on an unintended tx queue
      
      Change Notes:
      v2) Based on first pass review, updated the patch to restore the origional queue
      mapping that was found in bond_select_queue, rather than simply resetting to
      zero.  This preserves the value of queue_mapping when it was set on receive in
      the forwarding case which is desireable.
      
      v3) Fixed spelling an casting error in skb->cb
      
      v4) fixed to store raw queue_mapping to avoid double decrement
      
      v5) Eric D requested that ->cb access be wrapped in a macro.
      
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarJay Vosburgh <fubar@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      374eeb5a
  18. Jun 02, 2011
  19. May 26, 2011
  20. May 25, 2011
    • Flavio Leitner's avatar
      bonding: documentation and code cleanup for resend_igmp · 94265cf5
      Flavio Leitner authored
      
      Improves the documentation about how IGMP resend parameter
      works, fix two missing checks and coding style issues.
      
      Signed-off-by: default avatarFlavio Leitner <fbl@redhat.com>
      Acked-by: default avatarRick Jones <rick.jones2@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94265cf5
    • Neil Horman's avatar
      bonding: prevent deadlock on slave store with alb mode (v3) · 9fe0617d
      Neil Horman authored
      
      This soft lockup was recently reported:
      
      [root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
      [root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
      bonding: bond5: doing slave updates when interface is down.
      bonding bond5: master_dev is not up in bond_enslave
      [root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
      bonding: bond5: doing slave updates when interface is down.
      
      BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
      CPU 12:
      Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
      be2d
      Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
      RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
      .text.lock.spinlock+0x26/00
      RSP: 0018:ffff810113167da8  EFLAGS: 00000286
      RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
      RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
      RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
      R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
      R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
      FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0
      
      Call Trace:
       [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
       [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
       [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
       [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
       [<ffffffff8006457b>] __down_write_nested+0x12/0x92
       [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
       [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
       [<ffffffff80016b87>] vfs_write+0xce/0x174
       [<ffffffff80017450>] sys_write+0x45/0x6e
       [<ffffffff8005d28d>] tracesys+0xd5/0xe0
      
      It occurs because we are able to change the slave configuarion of a bond while
      the bond interface is down.  The bonding driver initializes some data structures
      only after its ndo_open routine is called.  Among them is the initalization of
      the alb tx and rx hash locks.  So if we add or remove a slave without first
      opening the bond master device, we run the risk of trying to lock/unlock a
      spinlock that has garbage for data in it, which results in our above softlock.
      
      Note that sometimes this works, because in many cases an unlocked spinlock has
      the raw_lock parameter initialized to zero (meaning that the kzalloc of the
      net_device private data is equivalent to calling spin_lock_init), but thats not
      true in all cases, and we aren't guaranteed that condition, so we need to pass
      the relevant spinlocks through the spin_lock_init function.
      
      Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
      the ndo_init path, so they are ready for use by the bond_store_slaves path.
      
      Change notes:
      v2) Based on conversation with Jay and Nicolas it seems that the ability to
      enslave devices while the bond master is down should be safe to do.  As such
      this is an outlier bug, and so instead we'll just initalize the errant spinlocks
      in the init path rather than the open path, solving the problem.  We'll also
      remove the warnings about the bond being down during enslave operations, since
      it should be safe
      
      v3) Fix spelling error
      
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatar <jtluka@redhat.com>
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: nicolas.2p.debian@gmail.com
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarJay Vosburgh <fubar@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fe0617d
  21. May 22, 2011
  22. May 15, 2011
  23. May 13, 2011
  24. May 12, 2011
  25. May 09, 2011
  26. May 05, 2011
  27. Apr 29, 2011
    • Ben Hutchings's avatar
      ipv4, ipv6, bonding: Restore control over number of peer notifications · ad246c99
      Ben Hutchings authored
      
      For backward compatibility, we should retain the module parameters and
      sysfs attributes to control the number of peer notifications
      (gratuitous ARPs and unsolicited NAs) sent after bonding failover.
      Also, it is possible for failover to take place even though the new
      active slave does not have link up, and in that case the peer
      notification should be deferred until it does.
      
      Change ipv4 and ipv6 so they do not automatically send peer
      notifications on bonding failover.
      
      Change the bonding driver to send separate NETDEV_NOTIFY_PEERS
      notifications when the link is up, as many times as requested.  Since
      it does not directly control which protocols send notifications, make
      num_grat_arp and num_unsol_na aliases for a single parameter.  Bump
      the bonding version number and update its documentation.
      
      Signed-off-by: default avatarBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarJay Vosburgh <fubar@us.ibm.com>
      Acked-by: default avatarBrian Haley <brian.haley@hp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad246c99
  28. Apr 25, 2011
    • Jiri Pirko's avatar
      bonding: move processing of recv handlers into handle_frame() · 3aba891d
      Jiri Pirko authored
      
      Since now when bonding uses rx_handler, all traffic going into bond
      device goes thru bond_handle_frame. So there's no need to go back into
      bonding code later via ptype handlers. This patch converts
      original ptype handlers into "bonding receive probes". These functions
      are called from bond_handle_frame and they are registered per-mode.
      
      Note that vlan packets are also handled because they are always untagged
      thanks to vlan_untag()
      
      Note that this also allows arpmon for eth-bond-bridge-vlan topology.
      
      Signed-off-by: default avatarJiri Pirko <jpirko@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3aba891d
  29. Apr 20, 2011
    • Jiri Bohac's avatar
      bonding: 802.3ad - fix agg_device_up · 2430af8b
      Jiri Bohac authored
      
      The slave member of struct aggregator does not necessarily point
      to a slave which is part of the aggregator. It points to the
      slave structure containing the aggregator structure, while
      completely different slaves (or no slaves at all) may be part of
      the aggregator.
      
      The agg_device_up() function wrongly uses agg->slave to find the state
      of the aggregator.  Use agg->lag_ports->slave instead. The bug has
      been introduced by commit 4cd6fe1c
      ("bonding: fix link down handling in 802.3ad mode").
      
      Signed-off-by: default avatarJiri Bohac <jbohac@suse.cz>
      Signed-off-by: default avatarJay Vosburgh <fubar@us.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2430af8b
Loading