Commit 63fe46da authored by David S. Miller's avatar David S. Miller
Browse files

Merge branch 'master' of git://


parents 99dd1a2b 066b2118
......@@ -72,7 +72,7 @@
kgdb is a source level debugger for linux kernel. It is used along
with gdb to debug a linux kernel. The expectation is that gdb can
be used to "break in" to the kernel to inspect memory, variables
and look through a cal stack information similar to what an
and look through call stack information similar to what an
application developer would use gdb for. It is possible to place
breakpoints in kernel code and perform some limited execution
......@@ -93,8 +93,10 @@
<chapter id="CompilingAKernel">
<title>Compiling a kernel</title>
To enable <symbol>CONFIG_KGDB</symbol>, look under the "Kernel debugging"
and then select "KGDB: kernel debugging with remote gdb".
To enable <symbol>CONFIG_KGDB</symbol> you should first turn on
"Prompt for development and/or incomplete code/drivers"
(CONFIG_EXPERIMENTAL) in "General setup", then under the
"Kernel debugging" select "KGDB: kernel debugging with remote gdb".
Next you should choose one of more I/O drivers to interconnect debugging
......@@ -310,8 +310,8 @@ and then start a subshell 'sh' in that cgroup:
cd /dev/cgroup
mkdir Charlie
cd Charlie
/bin/echo 2-3 > cpus
/bin/echo 1 > mems
/bin/echo 2-3 > cpuset.cpus
/bin/echo 1 > cpuset.mems
/bin/echo $$ > tasks
# The subshell 'sh' is now running in cgroup Charlie
......@@ -289,6 +289,14 @@ Who: Glauber Costa <>
What: old style serial driver for ColdFire (CONFIG_SERIAL_COLDFIRE)
When: 2.6.28
Why: This driver still uses the old interface and has been replaced
Who: Sebastian Siewior <>
What: /sys/o2cb symlink
When: January 2010
Why: /sys/fs/o2cb is the proper location for this information - /sys/o2cb
......@@ -92,7 +92,6 @@ prototypes:
void (*destroy_inode)(struct inode *);
void (*dirty_inode) (struct inode *);
int (*write_inode) (struct inode *, int);
void (*put_inode) (struct inode *);
void (*drop_inode) (struct inode *);
void (*delete_inode) (struct inode *);
void (*put_super) (struct super_block *);
......@@ -115,7 +114,6 @@ alloc_inode: no no no
destroy_inode: no
dirty_inode: no (must not sleep)
write_inode: no
put_inode: no
drop_inode: no !!!inode_lock!!!
delete_inode: no
put_super: yes yes no
......@@ -205,7 +205,6 @@ struct super_operations {
void (*dirty_inode) (struct inode *);
int (*write_inode) (struct inode *, int);
void (*put_inode) (struct inode *);
void (*drop_inode) (struct inode *);
void (*delete_inode) (struct inode *);
void (*put_super) (struct super_block *);
......@@ -246,9 +245,6 @@ or bottom half).
inode to disc. The second parameter indicates whether the write
should be synchronous or not, not all filesystems check this flag.
put_inode: called when the VFS inode is removed from the inode
drop_inode: called when the last access to the inode is dropped,
with the inode_lock spinlock held.
......@@ -69,7 +69,8 @@ point2: Set the pwm speed at a higher temperature bound.
The ADT7473 will scale the pwm between the lower and higher pwm speed when
the temperature is between the two temperature boundaries. PWM values range
from 0 (off) to 255 (full speed).
from 0 (off) to 255 (full speed). Fan speed will be set to maximum when the
temperature sensor associated with the PWM control exceeds temp#_max.
......@@ -51,26 +51,38 @@ A few combinations of the above flags are also defined for your convenience:
the transparent emulation layer)
When you write a new algorithm driver, you will have to implement a
function callback `functionality', that gets an i2c_adapter structure
pointer as its only parameter:
When you write a new adapter driver, you will have to implement a
function callback `functionality'. Typical implementations are given
struct i2c_algorithm {
/* Many other things of course; check <linux/i2c.h>! */
u32 (*functionality) (struct i2c_adapter *);
A typical SMBus-only adapter would list all the SMBus transactions it
supports. This example comes from the i2c-piix4 driver:
static u32 piix4_func(struct i2c_adapter *adapter)
A typically implementation is given below, from i2c-algo-bit.c:
A typical full-I2C adapter would use the following (from the i2c-pxa
static u32 bit_func(struct i2c_adapter *adap)
static u32 i2c_pxa_functionality(struct i2c_adapter *adap)
I2C_FUNC_SMBUS_EMUL includes all the SMBus transactions (with the
addition of I2C block transactions) which i2c-core can emulate using
I2C_FUNC_I2C without any help from the adapter driver. The idea is
to let the client drivers check for the support of SMBus functions
without having to care whether the said functions are implemented in
hardware by the adapter, or emulated in software by i2c-core on top
of an I2C adapter.
......@@ -78,36 +90,33 @@ CLIENT CHECKING
Before a client tries to attach to an adapter, or even do tests to check
whether one of the devices it supports is present on an adapter, it should
check whether the needed functionality is present. There are two functions
defined which should be used instead of calling the functionality hook
in the algorithm structure directly:
/* Return the functionality mask */
extern u32 i2c_get_functionality (struct i2c_adapter *adap);
/* Return 1 if adapter supports everything we need, 0 if not. */
extern int i2c_check_functionality (struct i2c_adapter *adap, u32 func);
check whether the needed functionality is present. The typical way to do
this is (from the lm75 driver):
This is a typical way to use these functions (from the writing-clients
int foo_detect_client(struct i2c_adapter *adapter, int address,
unsigned short flags, int kind)
static int lm75_detect(...)
/* Define needed variables */
/* As the very first action, we check whether the adapter has the
needed functionality: we need the SMBus read_word_data,
write_word_data and write_byte functions in this example. */
if (!i2c_check_functionality(adapter,I2C_FUNC_SMBUS_WORD_DATA |
goto ERROR0;
/* Now we can do the real detection */
/* Return an error */
if (!i2c_check_functionality(adapter, I2C_FUNC_SMBUS_BYTE_DATA |
goto exit;
Here, the lm75 driver checks if the adapter can do both SMBus byte data
and SMBus word data transactions. If not, then the driver won't work on
this adapter and there's no point in going on. If the check above is
successful, then the driver knows that it can call the following
functions: i2c_smbus_read_byte_data(), i2c_smbus_write_byte_data(),
i2c_smbus_read_word_data() and i2c_smbus_write_word_data(). As a rule of
thumb, the functionality constants you test for with
i2c_check_functionality() should match exactly the i2c_smbus_* functions
which you driver is calling.
Note that the check above doesn't tell whether the functionalities are
implemented in hardware by the underlying adapter or emulated in
software by i2c-core. Client drivers don't have to care about this, as
i2c-core will transparently implement SMBus transactions on top of I2C
......@@ -116,19 +125,19 @@ CHECKING THROUGH /DEV
If you try to access an adapter from a userspace program, you will have
to use the /dev interface. You will still have to check whether the
functionality you need is supported, of course. This is done using
the I2C_FUNCS ioctl. An example, adapted from the lm_sensors i2cdetect
program, is below:
the I2C_FUNCS ioctl. An example, adapted from the i2cdetect program, is
int file;
if (file = open("/dev/i2c-0",O_RDWR) < 0) {
if (file = open("/dev/i2c-0", O_RDWR) < 0) {
/* Some kind of error handling */
if (ioctl(file,I2C_FUNCS,&funcs) < 0) {
if (ioctl(file, I2C_FUNCS, &funcs) < 0) {
/* Some kind of error handling */
if (! (funcs & I2C_FUNC_SMBUS_QUICK)) {
if (!(funcs & I2C_FUNC_SMBUS_QUICK)) {
/* Oops, the needed functionality (SMBus write_quick function) is
not available! */
SMBus Protocol Summary
The following is a summary of the SMBus protocol. It applies to
all revisions of the protocol (1.0, 1.1, and 2.0).
Certain protocol features which are not supported by
......@@ -8,6 +9,7 @@ this package are briefly described at the end of this document.
Some adapters understand only the SMBus (System Management Bus) protocol,
which is a subset from the I2C protocol. Fortunately, many devices use
only the same subset, which makes it possible to put them on an SMBus.
If you write a driver for some I2C device, please try to use the SMBus
commands if at all possible (if the device uses only that subset of the
I2C protocol). This makes it possible to use the device driver on both
......@@ -15,7 +17,12 @@ SMBus adapters and I2C adapters (the SMBus command set is automatically
translated to I2C on I2C adapters, but plain I2C commands can not be
handled at all on most pure SMBus adapters).
Below is a list of SMBus commands.
Below is a list of SMBus protocol operations, and the functions executing
them. Note that the names used in the SMBus protocol specifications usually
don't match these function names. For some of the operations which pass a
single data byte, the functions using SMBus protocol operation names execute
a different protocol operation entirely.
Key to symbols
......@@ -35,17 +42,16 @@ Count (8 bits): A data byte containing the length of a block operation.
[..]: Data sent by I2C device, as opposed to data sent by the host adapter.
SMBus Write Quick
SMBus Quick Command: i2c_smbus_write_quick()
This sends a single bit to the device, at the place of the Rd/Wr bit.
There is no equivalent Read Quick command.
A Addr Rd/Wr [A] P
SMBus Read Byte
SMBus Receive Byte: i2c_smbus_read_byte()
This reads a single byte from a device, without specifying a device
register. Some devices are so simple that this interface is enough; for
......@@ -55,17 +61,17 @@ the previous SMBus command.
S Addr Rd [A] [Data] NA P
SMBus Write Byte
SMBus Send Byte: i2c_smbus_write_byte()
This is the reverse of Read Byte: it sends a single byte to a device.
See Read Byte for more information.
This operation is the reverse of Receive Byte: it sends a single byte
to a device. See Receive Byte for more information.
S Addr Wr [A] Data [A] P
SMBus Read Byte Data
SMBus Read Byte: i2c_smbus_read_byte_data()
This reads a single byte from a device, from a designated register.
The register is specified through the Comm byte.
......@@ -73,30 +79,30 @@ The register is specified through the Comm byte.
S Addr Wr [A] Comm [A] S Addr Rd [A] [Data] NA P
SMBus Read Word Data
SMBus Read Word: i2c_smbus_read_word_data()
This command is very like Read Byte Data; again, data is read from a
This operation is very like Read Byte; again, data is read from a
device, from a designated register that is specified through the Comm
byte. But this time, the data is a complete word (16 bits).
S Addr Wr [A] Comm [A] S Addr Rd [A] [DataLow] A [DataHigh] NA P
SMBus Write Byte Data
SMBus Write Byte: i2c_smbus_write_byte_data()
This writes a single byte to a device, to a designated register. The
register is specified through the Comm byte. This is the opposite of
the Read Byte Data command.
the Read Byte operation.
S Addr Wr [A] Comm [A] Data [A] P
SMBus Write Word Data
SMBus Write Word: i2c_smbus_write_word_data()
This is the opposite operation of the Read Word Data command. 16 bits
This is the opposite of the Read Word operation. 16 bits
of data is written to a device, to the designated register that is
specified through the Comm byte.
......@@ -113,8 +119,8 @@ S Addr Wr [A] Comm [A] DataLow [A] DataHigh [A]
S Addr Rd [A] [DataLow] A [DataHigh] NA P
SMBus Block Read
SMBus Block Read: i2c_smbus_read_block_data()
This command reads a block of up to 32 bytes from a device, from a
designated register that is specified through the Comm byte. The amount
......@@ -124,8 +130,8 @@ S Addr Wr [A] Comm [A]
S Addr Rd [A] [Count] A [Data] A [Data] A ... A [Data] NA P
SMBus Block Write
SMBus Block Write: i2c_smbus_write_block_data()
The opposite of the Block Read command, this writes up to 32 bytes to
a device, to a designated register that is specified through the
......@@ -134,10 +140,11 @@ Comm byte. The amount of data is specified in the Count byte.
S Addr Wr [A] Comm [A] Count [A] Data [A] Data [A] ... [A] Data [A] P
SMBus Block Process Call
SMBus Block Write - Block Read Process Call
SMBus Block Process Call was introduced in Revision 2.0 of the specification.
SMBus Block Write - Block Read Process Call was introduced in
Revision 2.0 of the specification.
This command selects a device register (through the Comm byte), sends
1 to 31 bytes of data to it, and reads 1 to 31 bytes of data in return.
......@@ -159,13 +166,16 @@ alerting device's address.
Packet Error Checking (PEC)
Packet Error Checking was introduced in Revision 1.1 of the specification.
PEC adds a CRC-8 error-checking byte to all transfers.
PEC adds a CRC-8 error-checking byte to transfers using it, immediately
before the terminating STOP.
Address Resolution Protocol (ARP)
The Address Resolution Protocol was introduced in Revision 2.0 of
the specification. It is a higher-layer protocol which uses the
messages above.
......@@ -177,14 +187,17 @@ require PEC checksums.
I2C Block Transactions
The following I2C block transactions are supported by the
SMBus layer and are described here for completeness.
They are *NOT* defined by the SMBus specification.
I2C block transactions do not limit the number of bytes transferred
but the SMBus layer places a limit of 32 bytes.
I2C Block Read
I2C Block Read: i2c_smbus_read_i2c_block_data()
This command reads a block of bytes from a device, from a
designated register that is specified through the Comm byte.
......@@ -203,8 +216,8 @@ S Addr Wr [A] Comm1 [A] Comm2 [A]
S Addr Rd [A] [Data] A [Data] A ... A [Data] NA P
I2C Block Write
I2C Block Write: i2c_smbus_write_i2c_block_data()
The opposite of the Block Read command, this writes bytes to
a device, to a designated register that is specified through the
......@@ -212,5 +225,3 @@ Comm byte. Note that command lengths of 0, 2, or more bytes are
supported as they are indistinguishable from data.
S Addr Wr [A] Comm [A] Data [A] Data [A] ... [A] Data [A] P
......@@ -1094,9 +1094,6 @@ and is between 256 and 4096 characters. It is defined in the file
mac5380= [HW,SCSI] Format:
mac53c9x= [HW,SCSI] Format:
machvec= [IA64] Force the use of a particular machine-vector
(machvec) in a generic kernel.
Example: machvec=hpzx1_swiotlb
......@@ -1525,6 +1522,8 @@ and is between 256 and 4096 characters. It is defined in the file
This is normally done in pci_enable_device(),
so this option is a temporary workaround
for broken drivers that don't call it.
skip_isa_align [X86] do not align io start addr, so can
handle more pci cards
firmware [ARM] Do not re-enumerate the bus but instead
just use the configuration from the
bootloader. This is currently used on
......@@ -994,7 +994,17 @@ The Linux kernel has eight basic CPU memory barriers:
DATA DEPENDENCY read_barrier_depends() smp_read_barrier_depends()
All CPU memory barriers unconditionally imply compiler barriers.
All memory barriers except the data dependency barriers imply a compiler
barrier. Data dependencies do not impose any additional compiler ordering.
Aside: In the case of data dependencies, the compiler would be expected to
issue the loads in the correct order (eg. `a[b]` would have to load the value
of b before loading a[b]), however there is no guarantee in the C specification
that the compiler may not speculate the value of b (eg. is equal to 1) and load
a before b (eg. tmp = a[1]; if (b != 1) tmp = a[b]; ). There is also the
problem of a compiler reloading b after having loaded a[b], thus having a newer
copy of b than a[b]. A consensus has not yet been reached about these problems,
however the ACCESS_ONCE macro is a good place to start looking.
SMP memory barriers are reduced to compiler barriers on uniprocessor compiled
systems because it is assumed that a CPU will appear to be self-consistent,
......@@ -8,17 +8,6 @@ Command line parameters
Enable logging of debug information in case of ccw device timeouts.
* cio_msg = yes | no
Determines whether information on found devices and sensed device
characteristics should be shown during startup or when new devices are
found, i. e. messages of the types "Detected device 0.0.4711 on subchannel
0.0.0042" and "SenseID: Device 0.0.4711 reports: ...".
Default is off.
* cio_ignore = {all} |
{<device> | <range of devices>} |
{!<device> | !<range of devices>}
Goals, Design and Implementation of the
new ultra-scalable O(1) scheduler
This is an edited version of an email Ingo Molnar sent to
lkml on 4 Jan 2002. It describes the goals, design, and
implementation of Ingo's new ultra-scalable O(1) scheduler.
Last Updated: 18 April 2002.
The main goal of the new scheduler is to keep all the good things we know
and love about the current Linux scheduler:
- good interactive performance even during high load: if the user
types or clicks then the system must react instantly and must execute
the user tasks smoothly, even during considerable background load.
- good scheduling/wakeup performance with 1-2 runnable processes.
- fairness: no process should stay without any timeslice for any
unreasonable amount of time. No process should get an unjustly high
amount of CPU time.
- priorities: less important tasks can be started with lower priority,
more important tasks with higher priority.
- SMP efficiency: no CPU should stay idle if there is work to do.
- SMP affinity: processes which run on one CPU should stay affine to
that CPU. Processes should not bounce between CPUs too frequently.
- plus additional scheduler features: RT scheduling, CPU binding.
and the goal is also to add a few new things:
- fully O(1) scheduling. Are you tired of the recalculation loop
blowing the L1 cache away every now and then? Do you think the goodness
loop is taking a bit too long to finish if there are lots of runnable
processes? This new scheduler takes no prisoners: wakeup(), schedule(),
the timer interrupt are all O(1) algorithms. There is no recalculation
loop. There is no goodness loop either.
- 'perfect' SMP scalability. With the new scheduler there is no 'big'
runqueue_lock anymore - it's all per-CPU runqueues and locks - two
tasks on two separate CPUs can wake up, schedule and context-switch
completely in parallel, without any interlocking. All
scheduling-relevant data is structured for maximum scalability.
- better SMP affinity. The old scheduler has a particular weakness that
causes the random bouncing of tasks between CPUs if/when higher
priority/interactive tasks, this was observed and reported by many
people. The reason is that the timeslice recalculation loop first needs
every currently running task to consume its timeslice. But when this
happens on eg. an 8-way system, then this property starves an
increasing number of CPUs from executing any process. Once the last
task that has a timeslice left has finished using up that timeslice,
the recalculation loop is triggered and other CPUs can start executing
tasks again - after having idled around for a number of timer ticks.
The more CPUs, the worse this effect.
Furthermore, this same effect causes the bouncing effect as well:
whenever there is such a 'timeslice squeeze' of the global runqueue,
idle processors start executing tasks which are not affine to that CPU.
(because the affine tasks have finished off their timeslices already.)
The new scheduler solves this problem by distributing timeslices on a
per-CPU basis, without having any global synchronization or
- batch scheduling. A significant proportion of computing-intensive tasks
benefit from batch-scheduling, where timeslices are long and processes
are roundrobin scheduled. The new scheduler does such batch-scheduling
of the lowest priority tasks - so nice +19 jobs will get
'batch-scheduled' automatically. With this scheduler, nice +19 jobs are
in essence SCHED_IDLE, from an interactiveness point of view.
- handle extreme loads more smoothly, without breakdown and scheduling
- O(1) RT scheduling. For those RT folks who are paranoid about the
O(nr_running) property of the goodness loop and the recalculation loop.
- run fork()ed children before the parent. Andrea has pointed out the
advantages of this a few months ago, but patches for this feature
do not work with the old scheduler as well as they should,
because idle processes often steal the new child before the fork()ing
CPU gets to execute it.
The core of the new scheduler contains the following mechanisms:
- *two* priority-ordered 'priority arrays' per CPU. There is an 'active'
array and an 'expired' array. The active array contains all tasks that
are affine to this CPU and have timeslices left. The expired array
contains all tasks which have used up their timeslices - but this array
is kept sorted as well. The active and expired array is not accessed
directly, it's accessed through two pointers in the per-CPU runqueue
structure. If all active tasks are used up then we 'switch' the two
pointers and from now on the ready-to-go (former-) expired array is the
active array - and the empty active array serves as the new collector
for expired tasks.
- there is a 64-bit bitmap cache for array indices. Finding the highest
priority task is thus a matter of two x86 BSFL bit-search instructions.
the split-array solution enables us to have an arbitrary number of active
and expired tasks, and the recalculation of timeslices can be done
immediately when the timeslice expires. Because the arrays are always
access through the pointers in the runqueue, switching the two arrays can
be done very quickly.
this is a hybride priority-list approach coupled with roundrobin
scheduling and the array-switch method of distributing timeslices.
- there is a per-task 'load estimator'.
one of the toughest things to get right is good interactive feel during
heavy system load. While playing with various scheduler variants i found
that the best interactive feel is achieved not by 'boosting' interactive
tasks, but by 'punishing' tasks that want to use more CPU time than there
is available. This method is also much easier to do in an O(1) fashion.
to establish the actual 'load' the task contributes to the system, a
complex-looking but pretty accurate method is used: there is a 4-entry
'history' ringbuffer of the task's activities during the last 4 seconds.
This ringbuffer is operated without much overhead. The entries tell the
scheduler a pretty accurate load-history of the task: has it used up more
CPU time or less during the past N seconds. [the size '4' and the interval
of 4x 1 seconds was found by lots of experimentation - this part is
flexible and can be changed in both directions.]
the penalty a task gets for generating more load than the CPU can handle
is a priority decrease - there is a maximum amount to this penalty
relative to their static priority, so even fully CPU-bound tasks will
observe each other's priorities, and will share the CPU accordingly.
the SMP load-balancer can be extended/switched with additional parallel
computing and cache hierarchy concepts: NUMA scheduling, multi-core CPUs
can be supported easily by changing the load-balancer. Right now it's
tuned for my SMP systems.
i skipped the prev->mm == next->mm advantage - no workload i know of shows
any sensitivity to this. It can be added back by sacrificing O(1)
schedule() [the current and one-lower priority list can be searched for a
that->mm == current->mm condition], but costs a fair number of cycles
during a number of important workloads, so i wanted to avoid this as much