TIL (22/12/2023)
I was thinking about what I can do next with ESP32 and micro-ROS and so I thought to learn RTOS first as it is also used and then dwelve into microROS.
For learning RTOS(gonna a start with FreeRTOS itself), I came across this tutorial which looks good.
Then I learned about Policy Gradient methods to solve MDPs from Deep RL course I’m doing from HF. Really the math is little involved which I have to dwelve step by step. While going across Policy Gradient Theorem derivation, I came across few tricks and assumptions used, for e.g.
-
Reinforce Trick: \(\frac{\nabla_{\theta}P(\tau)}{P(\tau)} = \nabla_{\theta}\\log(P(\tau))\)
-
State Distribution is independent of parameters($\theta$) of policy (I think this implies that the choice of action from action distribution given by the policy isn’t covered by the policy i.e its not a part of policy I guess).
-
Sampling m trajectories from the trajectory($\tau$) distribution
Next I have too do the hands-on and refer more about it.
RTOS
-
We can use RTOS when we have to run many tasks concurrently or if it’s time demanding, which can’t be done in general Super loop configurations(I mean the usual setup and loop parts).
-
ESP32 uses a modified version of FreeRTOS which supports its SMP (Symmetric MultiProcessing) architecture to schedule tasks by using both cores! (but this tutorials is only for multi-tasking in single core)
Task Scheduling
-
Context Switching : How are tasks are switched from one to another.
-
Task pre-emption