1. 26 May, 2017 1 commit
  2. 27 Apr, 2016 1 commit
  3. 07 Mar, 2016 1 commit
  4. 11 Feb, 2016 1 commit
  5. 05 Oct, 2015 1 commit
  6. 24 Sep, 2015 1 commit
    • Jiri Benc's avatar
      ipv4: send arp replies to the correct tunnel · 63d008a4
      Jiri Benc authored
      When using ip lwtunnels, the additional data for xmit (basically, the actual
      tunnel to use) are carried in ip_tunnel_info either in dst->lwtstate or in
      metadata dst. When replying to ARP requests, we need to send the reply to
      the same tunnel the request came from. This means we need to construct
      proper metadata dst for ARP replies.
      We could perform another route lookup to get a dst entry with the correct
      lwtstate. However, this won't always ensure that the outgoing tunnel is the
      same as the incoming one, and it won't work anyway for IPv4 duplicate
      address detection.
      The only thing to do is to "reverse" the ip_tunnel_info.
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  7. 17 Sep, 2015 3 commits
    • Eric W. Biederman's avatar
      netfilter: Pass net into okfn · 0c4b51f0
      Eric W. Biederman authored
      This is immediately motivated by the bridge code that chains functions that
      call into netfilter.  Without passing net into the okfns the bridge code would
      need to guess about the best expression for the network namespace to process
      packets in.
      As net is frequently one of the first things computed in continuation functions
      after netfilter has done it's job passing in the desired network namespace is in
      many cases a code simplification.
      To support this change the function dst_output_okfn is introduced to
      simplify passing dst_output as an okfn.  For the moment dst_output_okfn
      just silently drops the struct net.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric W. Biederman's avatar
      netfilter: Pass struct net into the netfilter hooks · 29a26a56
      Eric W. Biederman authored
      Pass a network namespace parameter into the netfilter hooks.  At the
      call site of the netfilter hooks the path a packet is taking through
      the network stack is well known which allows the network namespace to
      be easily and reliabily.
      This allows the replacement of magic code like
      "dev_net(state->in?:state->out)" that appears at the start of most
      netfilter hooks with "state->net".
      In almost all cases the network namespace passed in is derived
      from the first network device passed in, guaranteeing those
      paths will not see any changes in practice.
      The exceptions are:
      xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
      ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
      ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
      ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
      ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
      ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
      ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
      br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev
      In all cases these exceptions seem to be a better expression for the
      network namespace the packet is being processed in then the historic
      "dev_net(in?in:out)".  I am documenting them in case something odd
      pops up and someone starts trying to track down what happened.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric W. Biederman's avatar
      arp: Introduce arp_xmit_finish · f9e4306f
      Eric W. Biederman authored
      The function dev_queue_xmit_skb_sk is unncessary and very confusing.
      Introduce arp_xmit_finish to remove the need for dev_queue_xmit_skb_sk,
      and have arp_xmit_finish call dev_queue_xmit.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  8. 13 Aug, 2015 1 commit
    • David Ahern's avatar
      net: Fix up inet_addr_type checks · 30bbaa19
      David Ahern authored
      Currently inet_addr_type and inet_dev_addr_type expect local addresses
      to be in the local table. With the VRF device local routes for devices
      associated with a VRF will be in the table associated with the VRF.
      Provide an alternate inet_addr lookup to use a specific table rather
      than defaulting to the local table.
      inet_addr_type_dev_table keeps the same semantics as inet_addr_type but
      if the passed in device is enslaved to a VRF then the table for that VRF
      is used for the lookup.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  9. 29 Jul, 2015 1 commit
    • Eric Dumazet's avatar
      arp: filter NOARP neighbours for SIOCGARP · 11c91ef9
      Eric Dumazet authored
      When arp is off on a device, and ioctl(SIOCGARP) is queried,
      a buggy answer is given with MAC address of the device, instead
      of the mac address of the destination/gateway.
      We filter out NUD_NOARP neighbours for /proc/net/arp,
      we must do the same for SIOCGARP ioctl.
      lpaa23:~# ./arp
      MAC=00:01:e8:22:cb:1d      // correct answer
      lpaa23:~# ip link set dev eth0 arp off
      lpaa23:~# cat /proc/net/arp   # check arp table is now 'empty'
      IP address       HW type     Flags       HW address    Mask     Device
      lpaa23:~# ./arp
      MAC=00:1a:11:c3:0d:7f   // buggy answer before patch (this is eth0 mac)
      After patch :
      lpaa23:~# ip link set dev eth0 arp off
      lpaa23:~# ./arp
      ioctl(SIOCGARP) failed: No such device or address
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarVytautas Valancius <valas@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  10. 21 Jul, 2015 1 commit
  11. 07 Apr, 2015 1 commit
    • David Miller's avatar
      netfilter: Pass socket pointer down through okfn(). · 7026b1dd
      David Miller authored
      On the output paths in particular, we have to sometimes deal with two
      socket contexts.  First, and usually skb->sk, is the local socket that
      generated the frame.
      And second, is potentially the socket used to control a tunneling
      socket, such as one the encapsulates using UDP.
      We do not want to disassociate skb->sk when encapsulating in order
      to fix this, because that would break socket memory accounting.
      The most extreme case where this can cause huge problems is an
      AF_PACKET socket transmitting over a vxlan device.  We hit code
      paths doing checks that assume they are dealing with an ipv4
      socket, but are actually operating upon the AF_PACKET one.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  12. 03 Apr, 2015 2 commits
  13. 03 Mar, 2015 1 commit
    • Eric W. Biederman's avatar
      neigh: Factor out ___neigh_lookup_noref · 60395a20
      Eric W. Biederman authored
      While looking at the mpls code I found myself writing yet another
      version of neigh_lookup_noref.  We currently have __ipv4_lookup_noref
      and __ipv6_lookup_noref.
      So to make my work a little easier and to make it a smidge easier to
      verify/maintain the mpls code in the future I stopped and wrote
      ___neigh_lookup_noref.  Then I rewote __ipv4_lookup_noref and
      __ipv6_lookup_noref in terms of this new function.  I tested my new
      version by verifying that the same code is generated in
      ip_finish_output2 and ip6_finish_output2 where these functions are
      To get to ___neigh_lookup_noref I added a new neighbour cache table
      function key_eq.  So that the static size of the key would be
      I also added __neigh_lookup_noref for people who want to to lookup
      a neighbour table entry quickly but don't know which neibhgour table
      they are going to look up.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  14. 02 Mar, 2015 3 commits
  15. 11 Nov, 2014 1 commit
  16. 28 Sep, 2014 1 commit
  17. 01 Jan, 2014 1 commit
  18. 28 Dec, 2013 1 commit
  19. 11 Dec, 2013 1 commit
  20. 09 Dec, 2013 2 commits
  21. 03 Sep, 2013 1 commit
    • Tim Gardner's avatar
      net: neighbour: Remove CONFIG_ARPD · 3e25c65e
      Tim Gardner authored
      This config option is superfluous in that it only guards a call
      to neigh_app_ns(). Enabling CONFIG_ARPD by default has no
      change in behavior. There will now be call to __neigh_notify()
      for each ARP resolution, which has no impact unless there is a
      user space daemon waiting to receive the notification, i.e.,
      the case for which CONFIG_ARPD was designed anyways.
      Suggested-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Signed-off-by: default avatarTim Gardner <tim.gardner@canonical.com>
      Reviewed-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  22. 28 May, 2013 2 commits
  23. 26 Mar, 2013 1 commit
    • YOSHIFUJI Hideaki / 吉藤英明's avatar
      firewire net, ipv4 arp: Extend hardware address and remove driver-level packet inspection. · 6752c8db
      YOSHIFUJI Hideaki / 吉藤英明 authored
      Inspection of upper layer protocol is considered harmful, especially
      if it is about ARP or other stateful upper layer protocol; driver
      cannot (and should not) have full state of them.
      IPv4 over Firewire module used to inspect ARP (both in sending path
      and in receiving path), and record peer's GUID, max packet size, max
      speed and fifo address.  This patch removes such inspection by extending
      our "hardware address" definition to include other information as well:
      max packet size, max speed and fifo.  By doing this, The neighbour
      module in networking subsystem can cache them.
      Note: As we have started ignoring sspd and max_rec in ARP/NDP, those
            information will not be used in the driver when sending.
      When a packet is being sent, the IP layer fills our pseudo header with
      the extended "hardware address", including GUID and fifo.  The driver
      can look-up node-id (the real but rather volatile low-level address)
      by GUID, and then the module can send the packet to the wire using
      parameters provided in the extendedn hardware address.
      This approach is realistic because IP over IEEE1394 (RFC2734) and IPv6
      over IEEE1394 (RFC3146) share same "hardware address" format
      in their address resolution protocols.
      Here, extended "hardware address" is defined as follows:
      union fwnet_hwaddr {
      	u8 u[16];
      	struct {
      		__be64 uniq_id;		/* EUI-64			*/
      		u8 max_rec;		/* max packet size		*/
      		u8 sspd;		/* max speed			*/
      		__be16 fifo_hi;		/* hi 16bits of FIFO addr	*/
      		__be32 fifo_lo;		/* lo 32bits of FIFO addr	*/
      	} __packed uc;
      Note that Hardware address is declared as union, so that we can map full
      IP address into this, when implementing MCAP (Multicast Cannel Allocation
      Protocol) for IPv6, but IP and ARP subsystem do not need to know this
      format in detail.
      One difference between original ARP (RFC826) and 1394 ARP (RFC2734)
      is that 1394 ARP Request/Reply do not contain the target hardware address
      field (aka ar$tha).  This difference is handled in the ARP subsystem.
      CC: Stephan Gatzka <stephan.gatzka@gmail.com>
      Signed-off-by: default avatarYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  24. 18 Feb, 2013 2 commits
  25. 10 Feb, 2013 1 commit
  26. 24 Dec, 2012 1 commit
  27. 21 Dec, 2012 1 commit
    • Eric Dumazet's avatar
      ipv4: arp: fix a lockdep splat in arp_solicit() · 9650388b
      Eric Dumazet authored
      Yan Burman reported following lockdep warning :
      [ INFO: possible recursive locking detected ]
      3.7.0+ #24 Not tainted
      swapper/1/0 is trying to acquire lock:
        (&n->lock){++--..}, at: [<ffffffff8139f56e>] __neigh_event_send
      but task is already holding lock:
        (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit+0x1d4/0x280
      other info that might help us debug this:
        Possible unsafe locking scenario:
        *** DEADLOCK ***
        May be due to missing lock nesting notation
      4 locks held by swapper/1/0:
        #0:  (((&n->timer))){+.-...}, at: [<ffffffff8104b350>]
        #1:  (&n->lock){++--..}, at: [<ffffffff813f63f4>] arp_solicit
        #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff81395400>]
        #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff813cb41e>]
      stack backtrace:
      Pid: 0, comm: swapper/1 Not tainted 3.7.0+ #24
      Call Trace:
        <IRQ>  [<ffffffff8108c7ac>] validate_chain+0xdcc/0x11f0
        [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
        [<ffffffff81120565>] ? kmem_cache_free+0xe5/0x1c0
        [<ffffffff8108d570>] __lock_acquire+0x440/0xc30
        [<ffffffff813c3570>] ? inet_getpeer+0x40/0x600
        [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
        [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
        [<ffffffff8108ddf5>] lock_acquire+0x95/0x140
        [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
        [<ffffffff8108d570>] ? __lock_acquire+0x440/0xc30
        [<ffffffff81448d4b>] _raw_write_lock_bh+0x3b/0x50
        [<ffffffff8139f56e>] ? __neigh_event_send+0x2e/0x2f0
        [<ffffffff8139f56e>] __neigh_event_send+0x2e/0x2f0
        [<ffffffff8139f99b>] neigh_resolve_output+0x16b/0x270
        [<ffffffff813cb62d>] ip_finish_output+0x34d/0x640
        [<ffffffff813cb41e>] ? ip_finish_output+0x13e/0x640
        [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan]
        [<ffffffff813cb9a0>] ip_output+0x80/0xf0
        [<ffffffff813ca368>] ip_local_out+0x28/0x80
        [<ffffffffa046f25a>] vxlan_xmit+0x66a/0xbec [vxlan]
        [<ffffffffa046f146>] ? vxlan_xmit+0x556/0xbec [vxlan]
        [<ffffffff81394a50>] ? skb_gso_segment+0x2b0/0x2b0
        [<ffffffff81449355>] ? _raw_spin_unlock_irqrestore+0x65/0x80
        [<ffffffff81394c57>] ? dev_queue_xmit_nit+0x207/0x270
        [<ffffffff813950c8>] dev_hard_start_xmit+0x298/0x5d0
        [<ffffffff813956f3>] dev_queue_xmit+0x2f3/0x5d0
        [<ffffffff81395400>] ? dev_hard_start_xmit+0x5d0/0x5d0
        [<ffffffff813f5788>] arp_xmit+0x58/0x60
        [<ffffffff813f59db>] arp_send+0x3b/0x40
        [<ffffffff813f6424>] arp_solicit+0x204/0x280
        [<ffffffff813a1a70>] ? neigh_add+0x310/0x310
        [<ffffffff8139f515>] neigh_probe+0x45/0x70
        [<ffffffff813a1c10>] neigh_timer_handler+0x1a0/0x2a0
        [<ffffffff8104b3cf>] call_timer_fn+0x7f/0x1c0
        [<ffffffff8104b350>] ? detach_if_pending+0x120/0x120
        [<ffffffff8104b748>] run_timer_softirq+0x238/0x2b0
        [<ffffffff813a1a70>] ? neigh_add+0x310/0x310
        [<ffffffff81043e51>] __do_softirq+0x101/0x280
        [<ffffffff814518cc>] call_softirq+0x1c/0x30
        [<ffffffff81003b65>] do_softirq+0x85/0xc0
        [<ffffffff81043a7e>] irq_exit+0x9e/0xc0
        [<ffffffff810264f8>] smp_apic_timer_interrupt+0x68/0xa0
        [<ffffffff8145122f>] apic_timer_interrupt+0x6f/0x80
        <EOI>  [<ffffffff8100a054>] ? mwait_idle+0xa4/0x1c0
        [<ffffffff8100a04b>] ? mwait_idle+0x9b/0x1c0
        [<ffffffff8100a6a9>] cpu_idle+0x89/0xe0
        [<ffffffff81441127>] start_secondary+0x1b2/0x1b6
      Bug is from arp_solicit(), releasing the neigh lock after arp_send()
      In case of vxlan, we eventually need to write lock a neigh lock later.
      Its a false positive, but we can get rid of it without lockdep
      We can instead use neigh_ha_snapshot() helper.
      Reported-by: default avatarYan Burman <yanb@mellanox.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  28. 18 Nov, 2012 1 commit
    • Eric W. Biederman's avatar
      net: Allow userns root to control ipv4 · 52e804c6
      Eric W. Biederman authored
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      Settings that merely control a single network device are allowed.
      Either the network device is a logical network device where
      restrictions make no difference or the network device is hardware NIC
      that has been explicity moved from the initial network namespace.
      In general policy and network stack state changes are allowed
      while resource control is left unchanged.
      Allow creating raw sockets.
      Allow the SIOCSARP ioctl to control the arp cache.
      Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
      Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
      Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
      Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
      Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
      Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.
      adding, changing and deleting gre tunnels.
      adding, changing and deleting ipip tunnels.
      adding, changing and deleting ipsec virtual tunnel interfaces.
      MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
      Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
      arbitrary ip options.
      Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
      Allow setting the IP_TRANSPARENT ipv4 socket option.
      Allow setting the TCP_REPAIR socket option.
      Allow setting the TCP_CONGESTION socket option.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  29. 18 Sep, 2012 1 commit
  30. 07 Sep, 2012 1 commit
  31. 26 Jul, 2012 1 commit
  32. 20 Jul, 2012 1 commit