1. 07 Oct, 2009 1 commit
  2. 05 Oct, 2009 6 commits
    • Johannes Berg's avatar
      net: introduce NETDEV_POST_INIT notifier · 7ffbe3fd
      Johannes Berg authored
      For various purposes including a wireless extensions
      bugfix, we need to hook into the netdev creation before
      before netdev_register_kobject(). This will also ease
      doing the dev type assignment that Marcel was working
      on for cfg80211 drivers w/o touching them all.
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Marcel Holtmann's avatar
      usbnet: Use wwan%d interface name for mobile broadband devices · e1e499ee
      Marcel Holtmann authored
      Add support for usbnet based devices like CDC-Ether to indicate that they
      are actually mobile broadband devices. In that case use wwan%d as default
      interface name.
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Eric Dumazet's avatar
      tunnels: Optimize tx path · 0bfbedb1
      Eric Dumazet authored
      We currently dirty a cache line to update tunnel device stats
      (tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets
      counters that already are present in cpu cache, in the cache
      line shared with txq->_xmit_lock
      This patch extends IPTUNNEL_XMIT() macro to use txq pointer
      provided by the caller.
      Also &tunnel->dev->stats can be replaced by &dev->stats
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Stephen Hemminger's avatar
      ipv4: fib table algorithm performance improvement · 16c6cf8b
      Stephen Hemminger authored
      The FIB algorithim for IPV4 is set at compile time, but kernel goes through
      the overhead of function call indirection at runtime. Save some
      cycles by turning the indirect calls to direct calls to either
      hash or trie code.
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Neil Horman's avatar
      af_packet: add interframe drop cmsg (v6) · 97775007
      Neil Horman authored
      Add Ancilliary data to better represent loss information
      I've had a few requests recently to provide more detail regarding frame loss
      during an AF_PACKET packet capture session.  Specifically the requestors want to
      see where in a packet sequence frames were lost, i.e. they want to see that 40
      frames were lost between frames 302 and 303 in a packet capture file.  In order
      to do this we need:
      1) The kernel to export this data to user space
      2) The applications to make use of it
      This patch addresses item (1).  It does this by doing the following:
      A) Anytime we drop a frame for which we would increment po->stats.tp_drops, we
      also no increment a stats called po->stats.tp_gap.
      B) Every time we successfully enqueue a frame to sk_receive_queue, we record the
      value of po->stats.tp_gap in skb->mark.  skb->cb would nominally be the place to
      record this, but since all the space there is used up, we're overloading
      skb->mark.  Its safe to do since any enqueued packet is guaranteed to be
      unshared at this point, and skb->mark isn't used for anything else in the rx
      path to the application.  After we record tp_gap in the skb, we zero
      po->stats.tp_gap.  This allows us to keep a counter of the number of frames lost
      between any two enqueued packets
      C) When the application goes to dequeue a frame from the packet socket, we look
      at skb->mark for that frame.  If it is non-zero, we add a cmsg chunk to the
      msghdr of level SOL_PACKET and type PACKET_GAPDATA.  Its a 32 bit integer that
      represents the number of frames lost between this packet and the last previous
      frame received.
      Note there is a chance that if there is frame loss after a receive, and then the
      socket is closed, some gap data might be lost.  This is covered by the use of
      the PACKET_AUXDATA socket option, which gives total loss data.  With a bit of
      math, the final gap can be determined that way.
      I've tested this patch myself, and it works well.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
       include/linux/if_packet.h |    2 ++
       net/packet/af_packet.c    |   33 +++++++++++++++++++++++++++++++++
       2 files changed, 35 insertions(+)
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    • Ben Hutchings's avatar
      ethtool: Remove support for obsolete string query operations · a9828ec6
      Ben Hutchings authored
      The in-tree implementations have all been converted to
      Signed-off-by: default avatarBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  3. 04 Oct, 2009 2 commits
    • Alexey Dobriyan's avatar
    • Jens Axboe's avatar
      Revert "Seperate read and write statistics of in_flight requests" · 0f78ab98
      Jens Axboe authored
      This reverts commit a9327cac.
      Corrado Zoccolo <czoccolo@gmail.com> reports:
      "with 2.6.32-rc1 I started getting the following strange output from
      "iostat -kx 2":
      Linux 2.6.31bisect (et2) 	04/10/2009 	_i686_	(2 CPU)
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                10,70    0,00    3,16   15,75    0,00   70,38
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
      avgrq-sz avgqu-sz   await  svctm  %util
      sda              18,22     0,00    0,67    0,01    14,77     0,02
      43,94     0,01   10,53 39043915,03 2629219,87
      sdb              60,89     9,68   50,79    3,04  1724,43    50,52
      65,95     0,70   13,06 488437,47 2629219,87
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                 2,72    0,00    0,74    0,00    0,00   96,53
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
      avgrq-sz avgqu-sz   await  svctm  %util
      sda               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      sdb               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                 6,68    0,00    0,99    0,00    0,00   92,33
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
      avgrq-sz avgqu-sz   await  svctm  %util
      sda               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      sdb               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                 4,40    0,00    0,73    1,47    0,00   93,40
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
      avgrq-sz avgqu-sz   await  svctm  %util
      sda               0,00     0,00    0,00    0,00     0,00     0,00
      0,00     0,00    0,00   0,00 100,00
      sdb               0,00     4,00    0,00    3,00     0,00    28,00
      18,67     0,06   19,50 333,33 100,00
      Global values for service time and utilization are garbage. For
      interval values, utilization is always 100%, and service time is
      higher than normal.
      I bisected it down to:
      [a9327cac] Seperate read and write
      statistics of in_flight requests
      and verified that reverting just that commit indeed solves the issue
      on 2.6.32-rc1."
      So until this is debugged, revert the bad commit.
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
  4. 03 Oct, 2009 2 commits
  5. 02 Oct, 2009 3 commits
  6. 01 Oct, 2009 7 commits
    • KAMEZAWA Hiroyuki's avatar
      memcg: some modification to softlimit under hierarchical memory reclaim. · 4e649152
      KAMEZAWA Hiroyuki authored
      This patch clean up/fixes for memcg's uncharge soft limit path.
        Now, res_counter_charge()/uncharge() handles softlimit information at
        charge/uncharge and softlimit-check is done when event counter per memcg
        goes over limit. Now, event counter per memcg is updated only when
        memory usage is over soft limit. Here, considering hierarchical memcg
        management, ancesotors should be taken care of.
        Now, ancerstors(hierarchy) are handled in charge() but not in uncharge().
        This is not good.
        1. memcg's event counter incremented only when softlimit hits. That's bad.
           It makes event counter hard to be reused for other purpose.
        2. At uncharge, only the lowest level rescounter is handled. This is bug.
           Because ancesotor's event counter is not incremented, children should
           take care of them.
        3. res_counter_uncharge()'s 3rd argument is NULL in most case.
           ops under res_counter->lock should be small. No "if" sentense is better.
        * Removed soft_limit_xx poitner and checks in charge and uncharge.
          Do-check-only-when-necessary scheme works enough well without them.
        * make event-counter of memcg incremented at every charge/uncharge.
          (per-cpu area will be accessed soon anyway)
        * All ancestors are checked at soft-limit-check. This is necessary because
          ancesotor's event counter may never be modified. Then, they should be
          checked at the same time.
      Reviewed-by: default avatarDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Mike Frysinger's avatar
      asm-generic/gpio.h: pull in linux/kernel.h for might_sleep() · b3db4a8a
      Mike Frysinger authored
      The asm-generic/gpio.h header uses the might_sleep() macro but doesn't
      include the header for it, so any source code that might include
      linux/gpio.h before linux/kernel.h can easily lead to a build failure.
      Signed-off-by: default avatarMike Frysinger <vapier@gentoo.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Alexey Dobriyan's avatar
      const: constify remaining file_operations · 828c0950
      Alexey Dobriyan authored
      [akpm@linux-foundation.org: fix KVM]
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Acked-by: default avatarMike Frysinger <vapier@gentoo.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Jun'ichi Nomura's avatar
      Add a tracepoint for block request remapping · b0da3f0d
      Jun'ichi Nomura authored
      Since 2.6.31 now has request-based device-mapper, it's useful to have
      a tracepoint for request-remapping as well as bio-remapping.
      This patch adds a tracepoint for request-remapping, trace_block_rq_remap().
      Signed-off-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
    • Christoph Hellwig's avatar
      block: allow large discard requests · 67efc925
      Christoph Hellwig authored
      Currently we set the bio size to the byte equivalent of the blocks to
      be trimmed when submitting the initial DISCARD ioctl.  That means it
      is subject to the max_hw_sectors limitation of the HBA which is
      much lower than the size of a DISCARD request we can support.
      Add a separate max_discard_sectors tunable to limit the size for discard
      We limit the max discard request size in bytes to 32bit as that is the
      limit for bio->bi_size.  This could be much larger if we had a way to pass
      that information through the block layer.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
    • Christoph Hellwig's avatar
      block: use normal I/O path for discard requests · c15227de
      Christoph Hellwig authored
      prepare_discard_fn() was being called in a place where memory allocation
      was effectively impossible.  This makes it inappropriate for all but
      the most trivial translations of Linux's DISCARD operation to the block
      command set.  Additionally adding a payload there makes the ownership
      of the bio backing unclear as it's now allocated by the device driver
      and not the submitter as usual.
      It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether
      the queue supports discard operations or not.  blkdev_issue_discard now
      allocates a one-page, sector-length payload which is the right thing
      for the common ATA and SCSI implementations.
      The mtd implementation of prepare_discard_fn() is replaced with simply
      checking for the request being a discard.
      Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx>
      which did the prepare_discard_fn but not the different payload allocation
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
    • Zdenek Kabelac's avatar
      Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs · 48c0d4d4
      Zdenek Kabelac authored
      Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
      introduced in commit 1d54ad6d.
      Release kobject also in case the request_fn is NULL.
      Problem was noticed via kmemleak backtrace when some sysfs entries were
      note properly destroyed during  device removal:
      unreferenced object 0xffff88001aa76640 (size 80):
        comm "lvcreate", pid 2120, jiffies 4294885144
        hex dump (first 32 bytes):
          01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff  .........e......
          90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff  .f........S.....
          [<ffffffff813f9cc6>] kmemleak_alloc+0x26/0x60
          [<ffffffff8111d693>] kmem_cache_alloc+0x133/0x1c0
          [<ffffffff81195891>] sysfs_new_dirent+0x41/0x120
          [<ffffffff81194b0c>] sysfs_add_file_mode+0x3c/0xb0
          [<ffffffff81197c81>] internal_create_group+0xc1/0x1a0
          [<ffffffff81197d93>] sysfs_create_group+0x13/0x20
          [<ffffffff810d8004>] blk_trace_init_sysfs+0x14/0x20
          [<ffffffff8123f45c>] blk_register_queue+0x3c/0xf0
          [<ffffffff812447e4>] add_disk+0x94/0x160
          [<ffffffffa00d8b08>] dm_create+0x598/0x6e0 [dm_mod]
          [<ffffffffa00de951>] dev_create+0x51/0x350 [dm_mod]
          [<ffffffffa00de823>] ctl_ioctl+0x1a3/0x240 [dm_mod]
          [<ffffffffa00de8f2>] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod]
          [<ffffffff81177bfd>] compat_sys_ioctl+0xcd/0x4f0
          [<ffffffff81036ed8>] sysenter_dispatch+0x7/0x2c
          [<ffffffffffffffff>] 0xffffffffffffffff
      Signed-off-by: default avatarZdenek Kabelac <zkabelac@redhat.com>
      Reviewed-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
  7. 30 Sep, 2009 2 commits
  8. 29 Sep, 2009 3 commits
  9. 28 Sep, 2009 2 commits
  10. 29 Sep, 2009 1 commit
    • Theodore Ts'o's avatar
      ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks · 55138e0b
      Theodore Ts'o authored
      Work around problems in the writeback code to force out writebacks in
      larger chunks than just 4mb, which is just too small.  This also works
      around limitations in the ext4 block allocator, which can't allocate
      more than 2048 blocks at a time.  So we need to defeat the round-robin
      characteristics of the writeback code and try to write out as many
      blocks in one inode before allowing the writeback code to move on to
      another inode.  We add a a new per-filesystem tunable,
      max_writeback_mb_bump, which caps this to a default of 128mb per
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
  11. 27 Sep, 2009 3 commits
    • Dave Airlie's avatar
      drm/kms: make fb helper work for all drivers. · 74bf2ad5
      Dave Airlie authored
      This initialises the fb helper with the connector helper,
      so that the fb cmdline code works for intel as well.
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
    • Dave Young's avatar
      tty: Fix regressions caused by commit b50989dc · f278a2f7
      Dave Young authored
      The following commit made console open fails while booting:
      	commit b50989dc
      	Author: Alan Cox <alan@linux.intel.com>
      	Date:   Sat Sep 19 13:13:22 2009 -0700
      	tty: make the kref destructor occur asynchronously
      Due to tty release routines run in a workqueue now, error like the
      following will be reported while booting:
      INIT open /dev/console Input/output error
      It also causes hibernation regression to appear as reported at
      The reason is that now there's latency issue with closing, but when
      we open a "closing not finished" tty, -EIO will be returned.
      Fix it as per the following Alan's suggestion:
        Fun but it's actually not a bug and the fix is wrong in itself as
        the port may be closing but not yet being destructed, in which case
        it seems to do the wrong thing.  Opening a tty that is closing (and
        could be closing for long periods) is supposed to return -EIO.
        I suspect a better way to deal with this and keep the old console
        timing is to split tty->shutdown into two functions.
        tty->shutdown() - called synchronously just before we dump the tty
        onto the waitqueue for destruction
        tty->cleanup() - called when the destructor runs.
        We would then do the shutdown part which can occur in IRQ context
        fine, before queueing the rest of the release (from tty->magic = 0
        ...  the end) to occur asynchronously
        The USB update in -next would then need a call like
             if (tty->cleanup)
        at the top of the async function and the USB shutdown to be split
        between shutdown and cleanup as the USB resource cleanup and final
        tidy cannot occur synchronously as it needs to sleep.
        In other words the logic becomes
             final kref put
                     make object unfindable
                     clean it up
      Signed-off-by: default avatarDave Young <hidave.darkstar@gmail.com>
      [ rjw: Rebased on top of 2.6.31-git, reworked the changelog. ]
      Signed-off-by: default avatar"Rafael J. Wysocki" <rjw@sisk.pl>
      [ Changed serial naming to match new rules, dropped tty_shutdown as per
        comments from Alan Stern  - Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Alexey Dobriyan's avatar
      const: mark struct vm_struct_operations · f0f37e2f
      Alexey Dobriyan authored
      * mark struct vm_area_struct::vm_ops as const
      * mark vm_ops in AGP code
      But leave TTM code alone, something is fishy there with global vm_ops
      being used.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  12. 26 Sep, 2009 2 commits
  13. 25 Sep, 2009 6 commits