Chapter 4: Problem 72
For a time reversible Markov chain, argue that the rate at which transitions from \(i\) to \(j\) to \(k\) occur must equal the rate at which transitions from \(k\) to \(j\) to \(i\) occur.
/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none}
Learning Materials
Features
Discover
Chapter 4: Problem 72
For a time reversible Markov chain, argue that the rate at which transitions from \(i\) to \(j\) to \(k\) occur must equal the rate at which transitions from \(k\) to \(j\) to \(i\) occur.
All the tools & learning materials you need for study success - in one app.
Get started for free
Each day, one of \(n\) possible elements is requested, the \(i\) th one with probability \(P_{i}, i \geqslant 1, \sum_{1}^{n} P_{i}=1 .\) These elements are at all times arranged in an ordered list which is revised as follows: The element selected is moved to the front of the list with the relative positions of all the other elements remaining unchanged. Define the state at any time to be the list ordering at that time and note that there are \(n !\) possible states. (a) Argue that the preceding is a Markov chain. (b) For any state \(i_{1}, \ldots, i_{n}\) (which is a permutation of \(1,2, \ldots, n\) ), let \(\pi\left(i_{1}, \ldots, i_{n}\right)\) denote the limiting probability. In order for the state to be \(i_{1}, \ldots, i_{n}\), it is necessary for the last request to be for \(i_{1}\), the last non- \(i_{1}\) request for \(i_{2}\), the last non- \(i_{1}\) or \(i_{2}\) request for \(i_{3}\), and so on. Hence, it appears intuitive that $$ \pi\left(i_{1}, \ldots, i_{n}\right)=P_{i_{1}} \frac{P_{i_{2}}}{1-P_{i_{1}}} \frac{P_{i_{3}}}{1-P_{i_{1}}-P_{i_{2}}} \cdots \frac{P_{i_{n-1}}}{1-P_{i_{1}}-\cdots-P_{i_{n-2}}} $$ Verify when \(n=3\) that the preceding are indeed the limiting probabilities.
It follows from the argument made in Exercise 38 that state \(i\) is null recurrent if it is recurrent and \(\pi_{i}=0 .\) Consider the one-dimensional symmetric random walk of Example \(4.15\). (a) Argue that \(\pi_{i}=\pi_{0}\) for all \(i\). (b) Argue that all states are null recurrent.
Consider a branching process having \(\mu<1 .\) Show that if \(X_{0}=1\), then the expected number of individuals that ever exist in this population is given by \(1 /(1-\mu) .\) What if \(X_{0}=n ?\)
In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return. In this criterion we choose a number \(\alpha, 0<\alpha<1\), and try to choose a policy so as to maximize \(E\left[\sum_{i=0}^{\infty} \alpha^{i} R\left(X_{i}, a_{i}\right)\right]\) (that is, rewards at time \(n\) are discounted at rate \(\alpha^{n}\) ). Suppose that the initial state is chosen according to the probabilities \(b_{i} .\) That is, $$ P\left\\{X_{0}=i\right\\}=b_{i}, \quad i=1, \ldots, n $$ For a given policy \(\beta\) let \(y_{j a}\) denote the expected discounted time that the process is in state \(j\) and action \(a\) is chosen. That is, $$ y_{j a}=E_{\beta}\left[\sum_{n=0}^{\infty} \alpha^{n} I_{\left\\{X_{n}=j, a_{n}=a\right\\}}\right] $$ where for any event \(A\) the indicator variable \(I_{A}\) is defined by $$ I_{A}=\left\\{\begin{array}{ll} 1, & \text { if } A \text { occurs } \\ 0, & \text { otherwise } \end{array}\right. $$ (a) Show that $$ \sum_{a} y_{j a}=E\left[\sum_{n=0}^{\infty} \alpha^{n} I_{\left\\{X_{n}=j\right\\}}\right] $$ or, in other words, \(\sum_{a} y_{j a}\) is the expected discounted time in state \(j\) un\(\operatorname{der} \beta\). (b) Show that $$ \begin{aligned} \sum_{j} \sum_{a} y_{j a} &=\frac{1}{1-\alpha}, \\ \sum_{a} y_{j a} &=b_{j}+\alpha \sum_{i} \sum_{a} y_{i a} P_{i j}(a) \end{aligned} $$ (c) Let \(\left\\{y_{j a}\right\\}\) be a set of numbers satisfying $$ \begin{aligned} \sum_{j} \sum_{a} y_{j a} &=\frac{1}{1-\alpha}, \\ \sum_{a} y_{j a} &=b_{j}+\alpha \sum_{i} \sum_{a} y_{i a} P_{i j}(a) \end{aligned} $$ Argue that \(y_{j a}\) can be interpreted as the expected discounted time that the process is in state \(j\) and action \(a\) is chosen when the initial state is chosen according to the probabilities \(b_{j}\) and the policy \(\boldsymbol{\beta}\), given by $$ \beta_{i}(a)=\frac{y_{i a}}{\sum_{a} y_{i a}} $$ is employed. Hint: Derive a set of equations for the expected discounted times when policy \(\beta\) is used and show that they are equivalent to Equation ( \(4.38\) ). (d) Argue that an optimal policy with respect to the expected discounted return criterion can be obtained by first solving the linear program $$ \begin{array}{ll} \text { maximize } & \sum_{j} \sum_{a} y_{j a} R(j, a), \\ \text { such that } & \sum_{j} \sum_{a} y_{j a}=\frac{1}{1-\alpha}, \\ \sum_{a} y_{j a}=b_{j}+\alpha \sum_{i} \sum_{a} y_{i a} P_{i j}(a) \\ y_{j a} \geqslant 0, \quad \text { all } j, a \end{array} $$ and then defining the policy \(\beta^{*}\) by $$ \beta_{i}^{*}(a)=\frac{y_{i a}^{*}}{\sum_{a} y_{i a}^{*}} $$ where the \(y_{i a}^{*}\) are the solutions of the linear program.
For the Markov chain with states \(1,2,3,4\) whose transition probability matrix \(\mathbf{P}\) is as specified below find \(f_{i 3}\) and \(s_{i 3}\) for \(i=1,2,3\). $$ \mathbf{P}=\left[\begin{array}{llll} 0.4 & 0.2 & 0.1 & 0.3 \\ 0.1 & 0.5 & 0.2 & 0.2 \\ 0.3 & 0.4 & 0.2 & 0.1 \\ 0 & 0 & 0 & 1 \end{array}\right] $$
What do you think about this solution?
We value your feedback to improve our textbook solutions.