Skip to content
  • Leigh B Stoller's avatar
    Another Kludge for returning mounts to VMs. What a pain. Here · f1249179
    Leigh B Stoller authored
    are the details, so they are recorded someplace.
    
    The Racks do not have a real 172 router for the "jail" network.
    This is a mild pain, and one possibility would be to make the
    router be the physical node, so that each set of VMs is using its own
    router thus spreading the load.
    
    Well, that does not work because we use bridge mode on the physical
    host, and so the packets leave the node before they have a chance to
    go through the routing code. Yes, iptables does have something called
    a brouter via etables, but I could not make that work after a lot of
    trying and tearing my hair out
    
    So the next not so best thing is to make the control node be the
    router by sticking an alias on xenbr0 for 172.16.0.1. Fine, that works
    although performance could suffer.
    
    But what about NFS traffic to ops? It would be really silly to send
    that through the routing code on the control node, just to end up
    bridging into into the ops VM. So figured I would optimize that by
    changing domounts to return mounts that reference ops address on the
    jail network. And in fact this worked fine, but only for shared
    nodes.
    
    But it failed for exclusive VMs! In this case, we add a SNAT rule on
    the physical host that changes the source IP to be that of the
    physical host so that users cannot spoof a VM on a shared node and
    mount an NFS filesystem they should not have access to. In fact, it
    failed for UDP mounts but not for TCP mounts. When I looked at the
    traffic with tcpdump, it appeared that return TCP traffic from ops was
    using its jail IP, but return UDP traffic was using the public IP.
    This confuses SNAT and so the packets never get back into the VM.
    
    So, this change basically looks at the sharing mode of the node, and
    if its shared we use the jailip in the mounts, and if it is exclusive
    we use the public IP (and thus, that traffic gets routed through the
    control node). This sucks, but I am worn down on this.
    f1249179