Skip to content
  • Christian Borntraeger's avatar
    virtio net: fix oops on interface-up · 370076d9
    Christian Borntraeger authored
    I got the following oops during interface ifup. Unfortunately its not
    easily reproducable so I cant say for sure that my fix fixes this
    problem, but I am confident and I think its correct anyway:
    
       <2>kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:234!
        <4>illegal operation: 0001 [#1] PREEMPT SMP
        <4>Modules linked in:
        <4>CPU: 0 Not tainted 2.6.24zlive-guest-07293-gf1ca1512
    
    -dirty #91
        <4>Process swapper (pid: 0, task: 0000000000800938, ksp: 000000000084ddb8)
        <4>Krnl PSW : 0404300180000000 0000000000466374 (vring_disable_cb+0x30/0x34)
        <4>           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3
        <4>Krnl GPRS: 0000000000000001 0000000000000001 0000000010003800 0000000000466344
        <4>           000000000e980900 00000000008848b0 000000000084e748 0000000000000000
        <4>           000000000087b300 0000000000001237 0000000000001237 000000000f85bdd8
        <4>           000000000e980920 00000000001137c0 0000000000464754 000000000f85bdd8
        <4>Krnl Code: 0000000000466368: e3b0b0700004        lg      %r11,112(%r11)
        <4>           000000000046636e: 07fe                bcr     15,%r14
        <4>           0000000000466370: a7f40001            brc     15,466372
        <4>          >0000000000466374: a7f4fff6            brc     15,466360
        <4>           0000000000466378: eb7ff0500024        stmg    %r7,%r15,80(%r15)
        <4>           000000000046637e: a7f13e00            tmll    %r15,15872
        <4>           0000000000466382: b90400ef            lgr     %r14,%r15
        <4>           0000000000466386: a7840001            brc     8,466388
        <4>Call Trace:
        <4>([<000201500f85c000>] 0x201500f85c000)
        <4> [<0000000000466556>] vring_interrupt+0x72/0x88
        <4> [<00000000004801a0>] kvm_extint_handler+0x34/0x44
        <4> [<000000000010d22c>] do_extint+0xbc/0xf8
        <4> [<0000000000113f98>] ext_no_vtime+0x16/0x1a
        <4> [<000000000010a182>] cpu_idle+0x216/0x238
        <4>([<000000000010a162>] cpu_idle+0x1f6/0x238)
        <4> [<0000000000568656>] rest_init+0xaa/0xb8
        <4> [<000000000084ee2c>] start_kernel+0x3fc/0x490
        <4> [<0000000000100020>] _stext+0x20/0x80
        <4>
        <4> <0>Kernel panic - not syncing: Fatal exception in interrupt
        <4>
    
    After looking at the code and the dump I think the following scenario
    happened: Ifup was running on cpu2 and the interrupt arrived on cpu0.
    Now virtnet_open on cpu 2 managed to execute napi_enable and disable_cb
    but did not execute rx_schedule. Meanwhile on cpu 0 skb_recv_done was
    called by vring_interrupt, executed netif_rx_schedule_prep, which
    succeeded and therefore called disable_cb. This triggered the BUG_ON,
    as interrupts were already disabled by cpu 2.
    
    I think the proper solution is to make the call to disable_cb depend on
    the atomic update of NAPI_STATE_SCHED by using netif_rx_schedule_prep
    in the same way as skb_recv_done.
    
    Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
    Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
    Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
    370076d9