Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
C
capnet
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 18
    • Issues 18
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 1
    • Merge Requests 1
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Repository
    • Value Stream
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • tcloud
  • capnet
  • Issues
  • #13

Closed
Open
Opened May 17, 2016 by David Johnson@johnsondMaintainer

WFA failure

Can workflow agents recover from a crash without recreating or re-obtaining all caps they'd had?

Right now, if one crashes, we effectively must kill and restart the controller, for several reasons (when the WFA port dies, the controller can't handle that; and flows in the switch as a result of whatever the WFA has done cap-wise remain). This also requires all other WFAs to be restarted... and the entire cap world (and any node state that was changed during cap operations!) to be recreated. This is truly a pain.

Perhaps, if the WFAs journaled bits of metadata each time they recv() a cap from an RP or a membrane, they could "resume" much more easily. This seems straightforward.

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: tcloud/capnet#13