This is going to be an analysis in order to fix the bug from https://codepad.co/snippet/EP7uOrlh
There are two mutex protected variables involved:
1) AIStatefulTask::mState
2) AIEngine::mEngineState
mState contains the base state of the task and a pointer to it's current engine (a AIEngine*) called current_engine.
mEngineState contains a boolean 'waiting' and a std::list with pointers to AIStatefulTask's that
are added to this engine.
Obviously mState needs to be locked while reading/writing current_engine, and mEngineState needs to be locked
while manipulating the std::list.
Moreover, if ever the two are simulateneously locked then that has to always happen in the same order to avoid a deadlock.
Most of the time we expect a correspondence between current_engine and the linked list's of engines:
the task should be added to the list of the engine current_engine (and only that one). If current_engine is nullptr
then it should not be in the list of any engine. However, a task can (want to) switch engines or go idle and (want to)
be removed from it's current list.
The way things currently work is that current_engine signals the wish of the task; this value can be changed
only while executing, but since a task can also execute without being added to an engine (yet), for example
at the moment run(), cont() or advance_state() is called, we have the situation that current_engine can change
from nullptr to engine1, from engine1 to engine2 and from engine2 to nullptr while being added to any engine
or none at all. Engines simply check, shortly before running a task, what the value of current_engine is and
when that isn't themselves they don't run the task but remove them from their list.
In other words, a task adds himself to an engine when it wants to run in it, but can rapidly change its mind;
while an engine only removes a task when it gets to that. So, as a result, a task can be added to multiple lists
and only the value of current_engine is the canonical engine that may run it.
However, a task is being run by an engine while mState is unlocked (because we want it unlocked most of the time),
so theoretically an engine can start to run a task, which is then run from elsewhere before that happens,
switches state/engine, and then continues to run by the (now wrong) engine. The only way to prevent this is by
doing the canonical check whether we're being run by the requested engine AFTER mRunMutex is already locked
(which shuts out any runs from elsewhere that might change engine again).
Things we definitely want to prevent are the following:
* current_engine is set, but the task is not added to the list of that engine.
This is easily avoided by setting current_engine before adding the task to that engine.
However, this leaves one problem: a task can still be added more than once to an engine. If this is an engine
that it shouldn't run in than all instances are simply (eventually) removed, but it can happen that having two
or more instances of that same task in the list of an engine poses a problem:
The engine namely first run a task, and when it returns checks the value of current_engine and removes the
task from its list when current_engine became the nullptr. But since that might not remove all instances
of that task it may happen that the same task is still run again.
Removing not one but ALL instances from a list has it's problems too, and here it becomes tricky:
Because of the (current) order in which mState and mEngineState may be locked (mState first), we
cannot first lock mEngineState and then read the value of current_engine from the task. Instead,
the code looks like this:
do
{
AIStatefulTask& stateful_task(queued_element->stateful_task());
stateful_task.multiplex(AIStatefulTask::normal_run); // Run the task.
// This locks mState shortly, so it must be called before locking mEngineState because
// add() locks mEngineState while holding mState.
bool active = stateful_task.active(this); // Is current_engine equal this engine?
engine_state_type::wat engine_state_w(mEngineState); // This locks mEngineState.
if (!active)
engine_state_w->list.erase(queued_element++);
else
++queued_element;
}
while (queued_element != end);
This code guarantees that the element that is removed is the once that just ran:
queued_element is an iterator into the list and it just ran. If current_engine is
nullptr after returning from multiplex then we can safe remove this instance (arguably
we want to remove all instances even) because if stays nullptr then we can not
remove once but all instances even, and if directly after setting the active boolean
current_engine is set to this engine again then there was also another add() and
we will end up with (at least) one instance in the list anyway after the erase.
However, if we'd erase all elements we could end up with a list that doesn't contain
this task while it's current_engine points to us.
One solution could be to keep the lock on mState while locking mEngineState, just
like we do when we're adding a task to an engine, and removing all instances of
the task from the list. In that case the task is removed while current_engine is
nullptr and nobody can interfere. That is, if some other thread tries to run the
task it will block until we're done with removing all instances and when a subsequent
run of the task by another thread would readd us to this list then that's ok
because it will be in sync with setting current_engine. I don't really like this
solution because it keeps mState locked a lot longer and we have to traverse the
whole list checking for other instances instead of just erasing a single iterator.
This solution would be viable if keeping the lock on mState would also guarantee
that a task is only ever added once to a list: in that case we wouldn't have to
traverse the whole list and the lock on mState would be kept shorter. But currently
it is possible that other threads run the task at will, ie with a series of
idle(), cont(), idle(), cont(), ... where each idle() sets current_engine to nullptr
and each cont() sets it back to the engine (assuming that is -say- its default engine)
and adds it to its list. So, for this to work we then also need to change the code
that adds a task to a list and make sure it is never added more than once which then
again means keeping the lock on mState while traversing the whole list.
A second solution is to accept that the engine lists can be temporarily out of
sync with the state of a task and thus run it while it should be idle. While the
only problem seems to be that an engine should only run a task when it is allowed
to; that is, we can pass a pointer to the engine that runs multiplex along with
multiplex and check if that is equal to current_engine before running the task.
The only demand here, I think, is that this decision is made at a point where
it is guaranteed that we are also the first thread that will run that task, otherwise
the decision can get out of sync again. And that means a considerable amount
of initialization code has already run; making aborting multiplex() because we
do NOT want to run not trivial.
Fixed a million grammar and typo errors.
Be the first to comment
You can use [html][/html], [css][/css], [php][/php] and more to embed the code. Urls are automatically hyperlinked. Line breaks and paragraphs are automatically generated.