-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temporal kills #169
Temporal kills #169
Conversation
The failing test case is this:
Let me explain why I think this test is nonsense. First, we must agree that there is no guarantee as to how actions from two different fibers are to be interleaved by the scheduler. They fall as they may. Second, we must agree that there are two fibers at play in this test. One fiber comes from Therefore, it is conclusive that there is no guarantee as to whether the |
I agree that the test is vague. I had to go back to the original ticket to see what it was guarding against (#156). The original issue has to do with re-entry in the case of killing a fiber that is in the scheduler, which shouldn't be an issue. So maybe the question with this test then is not which error takes priority necessarily, but whether the off-stack exception is appropriate. There are no semantics for off-stack reporting, it's just meant to be a last-ditch reporting mechanism, so I'm inclined to replace the test with something else. |
src/Effect/Aff.js
Outdated
// It's possible to interrupt the fiber between enqueuing and | ||
// resuming, so we need to check that the runTick is still | ||
// valid. | ||
var asyncAction = step._1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tmp
register is for stuff like this. I'd prefer to not introduce a var
in the middle of a switch.
src/Effect/Aff.js
Outdated
if (runTick !== localRunTick) { | ||
return; | ||
} | ||
++runTick; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this ++runTick
is necessary. There's no reentry issue here because we aren't dealing with a user-invoked callback yet.
CHG: Applied Nate's code cleanup suggestions.
// This is initialized here (rather than in the Fiber constructor) because | ||
// otherwise, in V8, this Error object indefinitely hangs on to memory that | ||
// otherwise would be garbage collected. | ||
var early = new Error("[ParAff] Early exit"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find. Any thoughts as to why this happens? Just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess it has something to do with stack frames and traces or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<symbol> in Error
is what is hanging onto the memory, whatever that is, and I do not know why Error
is being referenced from the root. I figure it is some special debugging sauce in V8 being quirky.
src/Effect/Aff.js
Outdated
return; | ||
} | ||
var issync = true; | ||
var resolved = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think two separate pieces of state are necessary for this check. This pattern is done in the sequential
code several times.
https://github.com/slamdata/purescript-aff/blob/202a3eac12a207b6f109c9708f3efb034791bd36/src/Effect/Aff.js#L776-L795
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Just so you know, I am doing this so that run(runTick)
is a tail call.
@eric-corumdigital You had some concerns about leaks with this PR? Aside from the failing test, which I can address, was there anything about this PR that concerned you? |
I feel compelled to mention, even if it's just anecdotally, that this this pull request causes spurious off-stack errors during cancellation in a fairly large / complex project we're developing. Even if I don't have a minimal repro case, here's what I witnessed in one of my fibers that is being killed: In a bracket, I allocate an AVar |
@natefaubion The only lingering concern is that if Error objects are instantiated in an Aff computation there is the potential for it to indefinitely retain memory on V8. I have moved all Error instantiations within the Aff implementation so as to not cause a problem. However, any user of Aff can instantiate Error themselves and cause a memory leak. Why this happens is beyond me. Even in the best case of the V8 developers immediately correcting this issue that will still leave millions of unpatched V8 instances which PureScript will still have to run on. @felixSchl What is an "off-stack error"? Also, is your program this?
@natefaubion By the way, that is an example of how |
In trying to write a test for this, I have a scenario where this fix might fail.
What state should I would think this is just another (harder to trigger) temporal violation. |
Maybe |
@natefaubion What is missing is a semantics for Aff. There is no cohesive way to answer this question yet. However, short of that, I think what is missing from your scenario is exactly how these Fibers come about. Effects of multiple Fibers have an undefined interleaving (though locking mechanisms can add some constraints). Therefore, there is no defined time ordering either. The current method of execution is to choose arbitrarily an ordering. In other words, the state of f2 is undefined, or can be defined nondeterministically as either killed or completed. The current implementation may always choose one over the other but there is no prescription. This scenario is different than the temporal consistency I am trying to address. Within a single Fiber, effects happen one after the other, and one of those effects are killing. This is not a concern between Fibers. It is only a concern within a single Fiber. The problem is that a kill after a Fiber reaches a success state should not retroactively kill it, especially because the effects that brought the Fiber to a success state have already happened. Note, when a Fiber receives a kill from another Fiber is undefined and that is not an issue or anything I am trying to address here. |
With respect to my previous comments, we have identified and fixed the issues in our codebase, where were implicitly relying on undefined behaviour in Aff. We've been running with this patch applied for a while now and have had no further issues. So for what it's worth, it has my blessing. |
That's good to hear @felixSchl. I would like to provide alternative APIs in |
@felixSchl good to hear. Have you run on V8? Are you instantiating Error objects repeatedly over time? I have replicated this issue reliably on three different V8 versions. Two Chrome versions and a Node.js version across two different machines. It would be interesting if it was not happening to you and I'd like to see your program. I am on this issue again and may soon have to isolate the exact circumstances that cause a problem in V8. That is the only way it is practically possible to devise a workaround, if one exists. |
I don't think that error objects leaking memory has anything to do with the specifics of this PR, would it? |
By the way, the tests are failing with this change https://travis-ci.com/purescript-contrib/purescript-aff/builds/111644315#L1934. |
Yes, as discussed above, there is a test that currently relies on undefined behavior in order to pass. |
Possibly my investigation was flawed. It may have been that Error objects were simply large objects in comparison to all else, and so a memory leak of a different cause was simply most obvious when many Error objects were instantiated. I have been instantiating Error objects within Aff computations for a couple months now and have not observed memory leak problems. What I did learn is that I had a wrong and lacking understanding of supervisor contexts, and that was a source of memory leaking. Possibly that is the cause of all the memory leaks I have seen with respect to this patch. I now think this patch is not a memory leak risk. Also @felixSchl has evidently been using it without issue, so that makes two of us. |
…rescript-aff into temporal_kills
It's a bit hard to understand what implications of this PR are. It would be great if there were some examples outlining issues of current behavior and implications of this change. If there was a way to even encode that as a test that would be even better. Also If this #168 is related to #166 can you please add more details on how? |
@safareli The implications of this PR are that this potentially changes the order of fiber scheduling, so if you have written code that depends on a specific ordering of interleaved fiber effects, you will likely have issues. I don't think there's really a way to write it as a test, because I don't consider scheduling strategy (how multiple concurrent fiber effects are interleaved) to be part of Aff's guarantees. That is, Aff the implementation must be able to freely choose a scheduling strategy. The problem is that in a single-threaded runtime, it's extremely easy and tempting to rely on an observable ordering since it's deterministic. The simplest example is: example = do
fiber <- forkAff blork
killFiber (error "Boom") fiber Can we reason about the effects here and the state of So what actually happens now? Well it depends on what blork does! The current evaluation strategy is basically:
If blork is defined in only in terms of pure operations and If blork is defined in terms of So in this case, with the above example:
So should the final state of the fiber be completed or killed? The callback was invoked first, so you might expect completed, but because it was scheduled, and the kill is not, then the final state is killed. This is a problem in that
This PR effectively does 1. So, say we want binds to always transition atomically. For the implementation as is, the options are:
I am not considering options that do things like "just schedule killFiber" because it isn't a general solution to fairness and I'm not confident that it wouldn't just break in another way. This PR implements 3, which is a minor change to the actual implementation. All of these solutions change the scheduling strategy, so any code that results on observable interleaving of effects (such as using This solution is not the only solution of course, so I'm happy to hear other thoughts. |
couple days ago as I was looking at purescript-contrib/purescript-concurrent-queues#4 and this purescript-contrib/purescript-aff-bus#17 (comment). I was thinking on them and I come up with this module: Effect.Aff.Finally This is how it can be used with the the AVarLock from #166 (comment) : newtype AVarLock a = AVarLock (AVar a)
modify :: forall a b. AVarLock a -> (a -> Aff (Tuple a b)) -> Aff b
modify (AVarLock var) f = F.runFinallyM do
a <- AVar.take var # F.finallyAff do
// onResult handler will be execute if take succeeds even on current version of Aff.
F.onResult (flip AVar.put var)
res <- f a # F.finallyAff do
// if `f` succeeds then we put `fst` of it's result
F.onResult (fst >>> flip AVar.put var)
// if it fails then we put back initial value
F.onError (const $ AVar.put a var)
pure $ snd res I tried to use it with Bus too you can take a look at it here. This approach might be considered as a bit hacky, but I think it sort of provides somewhat nice workaround for #166 and #168:
Would love to see what you think on this! If folks find this useful will publish as a lib and we can continue discussion on API specifics once there are more tests and we are sure it works 100%. |
@eric-corumdigital @erisco would especially be interested in this: -- | As acquisition cab fail `killed` and `failed` accept `Maybe a`
-- | instead of `a` like regular BracketConditions.
type BracketConditions' a b =
{ killed ∷ Error → Maybe a → Aff Unit
, failed ∷ Error → Maybe a → Aff Unit
, completed ∷ b → a → Aff Unit
}
-- | Like normal generalBracket but acquisition computation
-- | can be killed or it can fail.
generalBracket' ∷ ∀ a b. Aff a → BracketConditions' a b → (a → Aff b) → Aff b
generalBracket' acquisition b computation = runFinallyM do
handle <- acquisition # finallyAff do
{current, final} <- ask
lift case current of
Resolved (Left err) -> b.failed err Nothing
Resolved (Right handle) -> final # runExists case _ of
Resolved (Left err) -> b.failed err (Just handle)
Resolved (Right res) ->
-- Note this can be written without `unsafeCoerce`
-- it's just easier like this as proof of concept.
b.completed (unsafeCoerce res) handle
Killed err -> b.killed err (Just handle)
Killed err -> b.killed err Nothing
lift $ computation handle
and using that here is how newtype AVarLock a = AVarLock (AVar a)
modify :: forall a b. AVarLock a -> (a -> Aff (Tuple a b)) -> Aff b
modify (AVarLock var) f = snd <$> generalBracket'
(AVar.take var)
{ killed: \_ mbVal -> for_ mbVal \val -> AVar.put val var
, failed: \_ mbVal -> for_ mbVal \val -> AVar.put val var
, completed: \res _ -> AVar.put (fst res) var
}
f |
I suspect this is never going to be merged and that is okay, so I am closing the PR. Obviously the commit history is a disaster and so would need to be cleaned up, should anyone want to see this back again. What I have learned is that this patch is myopic. It cares about a level of detail that is missing in other ways and places. I think Aff should continue to be the fast and loose and mostly working creation that it is. At least, I have failed to retrofit it in a backwards compatible and elegant manner. My plan is to create a new library that learns from Aff but isn't bounded by it. |
Resolves #168
Instead of starting the asynchronous effect and scheduling processing of the result, the starting of the asynchronous effect is scheduled and the result is processed immediately.
Before, a Fiber could be PENDING even though the result was received, because it is waiting on the scheduler. Now, if a Fiber is PENDING it is guaranteed the result is not yet received.
Before, when a Fiber was killed in PENDING status it was possible that the result was already received. This meant that a kill which happened after the effect would retroactively kill it — a temporal inconsistency. Now, when a Fiber is killed in PENDING status it is necessarily the case that the kill occurred before the result was received.