约定
$t时刻的状态记为x_t $
\(t 时刻的测量数据记为 z_t\)
\(t_1到t_2时刻的观测数据流(t_1\leq t_2)记为z_{t1:t2}=z_{t1},z_{t1+1},z_{t1+2},…,z_{t2}\)
\((t-1,t]时间内的控制数据记为u_t\)
\(t_1到t_2时刻的控制数据流(t_1\leq t_2)记为u_{t1:t2}=u_{t1},u_{t1+1},u_{t1+2},…,u_{t2}\)
\(状态变量x_t的置信度记为bel(x_t)\)
\(执行控制量u_t后,进行观测z_t之前的状态变量x_t的置信度记为\overline{b e l}(x_t)\)
引用
- \(\begin{array}{l} \mathrm{P}(\mathrm{x} \mid \mathrm{y}, \mathrm{z})\\ =\frac{\mathrm{p}(\mathrm{y}, \mathrm{z} \mid \mathrm{x}) \mathrm{p}(\mathrm{x})}{\mathrm{p}(\mathrm{y}, \mathrm{z})} \\ =\frac{\mathrm{p}(\mathrm{x}, \mathrm{y}, \mathrm{z})}{\mathrm{p}(\mathrm{y}, \mathrm{z})} \\ =\frac{\mathrm{p}(\mathrm{x}, \mathrm{y}, \mathrm{z})}{\mathrm{p}(\mathrm{y} \mid \mathrm{z}) \mathrm{p}(\mathrm{z})} \\ =\frac{\mathrm{p}(\mathrm{x}, \mathrm{y}, \mathrm{z}) \mathrm{p}(\mathrm{x}, \mathrm{z})}{\mathrm{p}(\mathrm{y} \mid \mathrm{z}) \mathrm{p}(\mathrm{z}) \mathrm{p}(\mathrm{x}, \mathrm{z})} \\ \because \frac{\mathrm{p}(\mathrm{x}, \mathrm{y}, \mathrm{z})}{\mathrm{p}(\mathrm{x}, \mathrm{z})}=\mathrm{P}(\mathrm{y} \mid \mathrm{x}, \mathrm{z}), \frac{\mathrm{p}(\mathrm{x}, \mathrm{z})}{\mathrm{p}(\mathrm{z})}=\mathrm{P}(\mathrm{x} \mid \mathrm{z}) \\ \therefore \mathrm{p}(\mathrm{x} \mid \mathrm{y}, \mathrm{z})=\frac{\mathrm{p}(\mathrm{y} \mid \mathrm{x}, \mathrm{z}) \mathrm{p}(\mathrm{x} \mid \mathrm{z})}{\mathrm{p}(\mathrm{y} \mid \mathrm{z})} \end{array}\)
- \(p(x)=\int p(x \mid y) p(y) d y\)
- \(p(x \mid y)=\frac{p(x, y)}{p(y)}\)
推导
前提:正确初始化初始时刻 \(t=0\) 的置信度 \(bel(x_0)\) 。
\(\begin{array}{l} \operatorname{bel}\left(x_{t}\right)\\ =p\left(x_{t} \mid z_{t}, z_{1: t-1}, u_{1: t}\right)\\ =p\left(z_{t} \mid x_{t}, z_{1: t-1}, u_{1: t}\right) \cdot p\left(x_{t} \mid z_{1: t-1}, u_{1: t} \right) \cdot (p\left(z_{t} \mid z_{1: t-1}, u_{1: t}\right))^{-1}\\ =\eta \cdot p\left(z_{t} \mid x_{t}, z_{1: t-1}, u_{1: t}\right) \cdot p\left(x_{t} \mid z_{1: t-1}, u_{1: t}\right) \end{array}\)
如果状态 $x_t $ 能很好的预测未来,那么就说状态 \(x_t\) 是完备complete的。状态的完备性意味着历史状态、测量值以及控制量都不再能够提供使我们更准确预测未来的额外信息。
关于完备性的定义并不要求未来是状态的一个确切的函数。未来是随机的,但除了状态 \(x_t\),之前的任何状态都不能对未来产生影响。 满足这一特性的过程就是人们常说的马尔可夫链Markov chains。
若状态 \(x_t\) 是完备complete的,有:
\(p\left(z_{t} \mid x_{t}, z_{1: t-1}, u_{1: t}\right)=p\left(z_{t} \mid x_{t}\right)\)
即,
\(\begin{array}{l} \operatorname{bel}\left(x_{t}\right)\\ =\eta \cdot p\left(z_{t} \mid x_{t}\right) \cdot p\left(x_{t} \mid z_{1: t-1}, u_{1: t}\right)\\ =\eta \cdot p\left(z_{t} \mid x_{t}\right) \cdot \overline{b e l}(x_t)\\ \begin{array}{l} =\eta \cdot p\left(z_{t} \mid x_{t}\right) \cdot \frac{p\left(x_{t} \cdot z_{1: t-1} \cdot u_{1: t}\right)}{p\left(z_{1: t-1} \cdot u_{1: t}\right)}\\ \begin{array}{l} =\eta \cdot p\left(z_{t} \mid x_{t}\right) \cdot \frac{\int_{x_{t-1}} p\left(x_{t} \cdot z_{1: t-1} \cdot u_{1: t} \mid x_{t-1}\right) \cdot p\left(x_{t-1}\right) d x_{t-1}}{p\left(z_{1: t-1} \cdot u_{1: t}\right)} \\ =\eta \cdot p\left(z_{t} \mid x_{t}\right) \cdot \frac{\int_{x_{t-1}} p\left(x_{t} \cdot z_{1: t-1} \cdot u_{1: t} \cdot x_{t-1}\right) d x_{t-1}}{p\left(z_{1: t-1} \cdot u_{1: t}\right)} \end{array}\\ =\eta \cdot p\left(z_{t} \mid x_{t}\right) \cdot \int_{x_{t-1}} \frac{p\left(x_{t} \cdot z_{1: t-1} \cdot u_{1: t} \cdot x_{t-1}\right)}{p\left(z_{1: t-1} \cdot u_{1: t} \cdot x_{t-1}\right)} \cdot \frac{p\left(z_{1: t-1} \cdot u_{1: t} \cdot x_{t-1}\right)}{p\left(z_{1: t-1} \cdot u_{1: t}\right)} d x_{t-1}\\ =\eta \cdot p\left(z_{t} \mid x_{t}\right) \cdot \int_{x_{t-1}} p\left(x_{t} \mid x_{t-1}, z_{1: t-1}, u_{1: t}\right) \cdot p\left(x_{t-1} \mid z_{1: t-1}, u_{1: t}\right) d x_{t-1}\\ \end{array} \end{array}\)
- 全概率公式 P(x)=∫P(x|y)P(y)dy
- 贝叶斯公式 P(x|y)=P(xy) / P(y)
因为假设了状态具有完备性,所以如果我们知道了 $x_{t-1} $ ,过去的观测量和控制量将不携带关于状态 \(x_t\) 的信息。因此,有:
\(p\left(x_{t} \mid x_{t-1}, z_{1: t-1}, u_{1: t}\right)=p\left(x_{t} \mid x_{t-1}, u_{t}\right)\)
且,考虑到t时刻的控制量 \(u_t\) 与t-1时刻的状态 \(x_{t-1}\) 无关,则:
\(\begin{array}{l} \operatorname{bel}\left(x_{t}\right)\\ =\eta \cdot p\left(z_{t} \mid x_{t}\right) \cdot \int_{x_{t-1}} p\left(x_{t} \mid x_{t-1}, u_{t}\right) \cdot p\left(x_{t-1} \mid z_{1: t-1}, u_{1: t}\right) d x_{t-1}\\ =\eta \cdot p\left(z_{t} \mid x_{t}\right) \cdot \int_{x_{t-1}} p\left(x_{t} \mid x_{t-1}, u_{t}\right) \cdot p\left(x_{t-1} \mid z_{1: t-1}, u_{1: t-1}\right) d x_{t-1}\\ =\eta \cdot p\left(z_{t} \mid x_{t}\right) \cdot \int_{x_{t-1}} p\left(x_{t} \mid x_{t-1}, u_{t}\right) \cdot b e l\left(x_{t-1}\right) d x_{t-1} \end{array}\)
结论
\(\begin{array}{l} \overline{\operatorname{bel}}\left(x_{t}\right)=\int p\left(x_{t} \mid x_{t-1},u_{t}\right) \text { bel }\left(x_{t-1}\right) d x_{t-1} \\ \operatorname{bel}\left(x_{t}\right)=\eta \cdot p\left(z_{t} \mid x_{t}\right) \overline{\operatorname{bel}}\left(x_{t}\right) \end{array}\)