Use control network warnings to modify extension behavior.
Just what it says. @gtw, @johnsond, and @ricci added the following:
Gary Wong [7:59 AM] Sounds good in principle but I think we need to be able to detect control net abuse with very low false positives before we start taking action in response...
[7:59] It's not at all clear what experiments requiring getting a ton of data in/out of the cluster should do.
David Johnson [7:59 AM] Basically, experiments that blast the control network should perhaps not get free extensions? I think this should probably only apply if it happens on greater then 50% of the nodes in the experiment at a time. Some people legitimately must stage in/out large datasets, and the control net is the only way we provide to do that.
Leigh Stoller [8:01 AM] Yep, agreed. But we have plenty of experiments with no experimental network defined that send continuous warnings from the cnetwatch script. Also, I think the cnetwatch script triggers on packet rates, and its hard to send a 100000 packets a second when transferring a large file offsite. (edited)
David Johnson [8:06 AM] Ok, some combination of these thresholds sounds good to me. If people respond positively to the warnings, do we also automatically let them out of extension-jail?
Robert Ricci [9:41 AM] I agree on locking down extensions if they have too much control net traffic
[9:42] I think what we might need to do is have a more sophisticated way to override extension policies
[9:42] Per-experiment, per-user, and per-project
[9:43] For example, if I see someone asking for bogus extensions, it would be nice if I can put a lock on them that reduces the "free" extensions they get on future experiments
[9:43] And maybe this lock could go away next time they get an extension actually granted
[9:44]
I'd also like to be able to formally codify the "no extensions past a week" policy for cord-testdrive
(edited)
Leigh Stoller [9:45 AM] So, something more fine grained then the Lockout toggle, which says no more extension requests.
Robert Ricci [9:45 AM] Right
[9:45] And something that can carry some text to explain to the user