Self-Organizing Kinematics

Adapt or perish, now as ever, is nature's inexorable imperative.

Introduction

Consider the case of a simple 2 DOF robotic arm. Using traditional control theory, the first thing an engineer would have to do is compute the forward and inverse kinematic equations for the end effector of the robot arm so that the task space goals can be converted into joint space representation and visa versa. The forward kinematic equations for a 2 DOF robot arm are:

\[ x = l_1*\cos(\theta_1) + l_2*\cos(\theta_1 + \theta_2) \] \[ y = l_1*\sin(\theta_1) + l_2*\sin(\theta_1 + \theta_2) \]

While the forward kinematics may not have been too challenging to figure out by hand, the inverse kinematics are more prohibitive even for a simple system such as this and do not give unique solutions. In this case, each \((x,y)\) position gives two possible theta positions. 

\[ r = \sqrt{x^2 + y^2} \] \[ c_2 = \frac{r^2 -l_1^2 -l_2^2}{2l_1l_2} \] \[ s_2^{+/-} = +/-(1-c_2^2)^{1/2} \] \[ \alpha^{+/-} = \tan^{-1}(\frac{l_2s_2^{+/-}}{l_1+l_2c_2}) \] \[ \theta_2^{+/-} = \tan^{-1}(\frac{s_2^{+/-}}{c_2}) \] \[ \theta_1^{+/-} = \tan^{-1}(\frac{y}{x})-\alpha^{+/-} \]

While software packages exist to derive these relationships from a specification file and perform the computations efficiently, thereby significantly reducing the burden on the engineer's time, they are still highly dependent upon accurate measurements of parameters such as \(l_1\) and \(l_2\). The advantage of this approach is that when it is done properly there is theoretically no limit to the precision of the system. The trade-off is that the system cannot respond to dynamic changes in its configuration such as lengthening of one of the links or adding another link to the end. 

Outside of traditional control theory, self organizing systems can been used in interesting ways to create a sensorimotor mapping between the task space and joint space that has the following properties

  • Adaptive to changes in configuration with continuous learning
  • Learns without supervision and with no model specification beforehand
  • Can lean a direct mapping between vision sensor and arm configuration without conversion to an intermediate task space.

Self Organizing Maps

Self organizing maps (SOMs) were first introduced by Kohonen in 1982 as a type of artificial neural network to serve as a model for some kinds of biological computation. However, the connection with actual neuron dynamics is thin and the SOM might be better thought of as a topological graph which does not need to output a single value but can be extended easily to output vectors, logic, or functions, none of which have any biological plausibility.

The traditional SOM is self organized as follows. Each node \(n_i\) contains a vector \( \mathbf{w}_i \) and has a spatial distance \(dist(n_i, n_j)\)  between itself and all other nodes in the network.  A sample vector \( \mathbf{x}(t) \) is drawn from the space to be learned and compared with each node in the network. A single winner node \( \mathbf{w}_i^{*} \) is then chosen according to

\[ \mathbf{w}_i^{*} = \min_{i \epsilon A}||\mathbf{w}_i(t) - \mathbf{x}(t)|| \]

You then update each node vector toward the input by 

\[ \Delta \mathbf{w}_i(t) = \eta(t)\cdot h(i,i^{*};t) \cdot [\mathbf{x}(t) - \mathbf{w}_i(t) ] \]

where \(\eta(t)\) is a variable learning rate \([0, 1]\) and \(h(i,i^{*};t)\) is a neighborhood function that modulates the degree of change depending on the distance between nodes. Typically, a Gaussian is used for the neighborhood function

\[ h(i,i^{*};t) = \exp(\frac{-||r_i-r_i^*||^2}{2\sigma^2}) \]

Large values of the learning rate and the neighborhood function make the SOM more plastic and responsive to the input while small values result in a more stable map. Typically, these time varying parameters are annealed under a fixed time scheme such as any of the following:

\[ \eta(t) = \eta_i(\frac{\eta_f}{\eta_i})^{\frac{t}{t_{max}}} \] \[ \eta(t) = \eta_f + (\eta_i - \eta_f)\exp^{\frac{-t}{\tau}} \]

These methods provide one-shot learning since the networks never become plastic again. Continuous learning over any time period can be achieved by creating a loopback term which tracks the average error

\[ \epsilon(t+1) = \epsilon(t) + \gamma(\left \|\mathbf{w} - \mathbf{w^*} \right \| - \epsilon(t)) \]

and then adjusting the other parameters based on the average error:

\[ \epsilon(t+1) = \epsilon(t) + \gamma(\left \|\mathbf{w} - \mathbf{w^*} \right \| - \epsilon(t)) \] \[ \sigma(t+1) = \sigma(t) + \beta((\sigma_i - \sigma_f)\tanh(\lambda\epsilon(t))) + \sigma_f - \sigma(t)) \]

However, these update methods alter the plasticity of the entire network even if only a sparse sub-group of nodes are contributing to a large portion of the error. This fails the 'credit assignment' problem of assigning error to the correct nodes. Therefore, it is best to change the overall network rates very slowly after convergence and instead modulate the individual node weight update step by a component representing the global network error 

\[ \Delta \mathbf{w}_i = \eta\cdot h(i,i^{*}) \cdot [\tau\tanh(\gamma(\mathbf{x} -\mathbf{w}_{out})^2 )] \cdot [\mathbf{x} - \mathbf{w}_i ] \]

As an example SOM, consider a network of 50x50 nodes which each contain a RGD vector \([r,g,b]\). The network is initialized to have random weights and updated by drawing random RGB values from \([0,255]\). The result is an organized map of the colors with overlap. 


Naive SOM Kinematics

We can attempt to use the SOM to autonomously learn approximate the forward and inverse kinematics of the system without any a priori specification. Early work using SOMs for kinematic mapping took a naive approach, using only a single self organizing map to learn the relationship between the two spaces. The general principle works as follows. Suppose we have an input sensor space (such as camera vision) V and joint angle space U.

Taken from Ritter H et al. (1992)

Every node in the single layer network contains a vector that quantizes input/output relationships between two spaces.

\[ \mathbf{w_i} = [\mathbf{w_i}^{in}, \mathbf{w_i}^{out}] \]

The network is then trained using \( \mathbf{w_i} \) in the manner described in the prior section. After training, a query could then be placed on the network by choosing point in U or V space and querying the SOM for the appropriate response using either winner-take-all or averaging over the node responses. 

Despite some published success in the early literature, this method has some flaws that proved difficult to escape from. There is no clear way of handling many-to-one relationships in the mapping as is typically found in inverse kinematic solutions. The averaging effects of the SOM update step typically create many nodes with poor solutions.

For example, in the case of a single DOF arm manipulator of unit length that can move between \( [0, \pi] \), the network receives as input a vector containing variables (\(x,y\)) and output angles \(\theta_1 \). 

\[ \mathbf{w_i} = [x, y, \theta_1] \]

After training, the network can then be queried. 

A sample 50x50 network after training. A) The \(\theta\) component of the SOM vectors. Red corresponds to values of \(\pi\) and black values correspond to 0. B) The \( (x,y) \) components of the SOM vectors. Black corresponds to (-X, 0), Green corresponds to (+X,0) and Blue-Green corresponds to (X,+Y). C) The winner-take-all strategy  gives poor results. D) Averaging over node activation gives better results but always produces an under-estimate. Notice that the SOM has difficulty representing the endpoints.

While a rough approximation of the kinematics is learned autonomously, as shown in the figure above for a 1 DOF manipulator, these results are not particularly inspiring. While there are some tricks to improve performance, such as replacing each node with a linear function appropriator, it is difficult to make this method accurate and efficient by itself.


Associative Learning

Hebbian learning is a connection scheme developed from neural systems that follows the rubric "neurons that fire together, wire together". This scheme is usually at the heart of simple associative neural systems. We can use this associative learning ability to learn a mapping between the sensor space and the joint space by creating two separate layers, on for sensor variables V and one for joint variables U. Each layer is activated by a kernel (or tuning function) using a Gaussian, for example, the activation kernel for the sensor space is described as:

\[ K_{V}(v,w_i) = V_0 \cdot e^{\frac{-\left \|v - w_i \right \|^2}{2\cdot \sigma_V^2}} \]

Example activation for the sensor network V with input (0.5,0) using a Gaussian kernel. 

Connection weights between the two layers are then updated according to the following learning rule

where \(x_j \) is the activation of the node in the sensor space network V, \( y_i \) is the activation of a node in the joint space network U, and \( w_{i,j} \) is the weight connection between node \( x_j \) and node \( y_i \). The weight decay term is an important component that helps the network unlearn false associations that occur do to noise and can be modulated depending on the accuracy of the system. This update rule will learn the associative activity from the sensor space V to the joint space U. Moreover, each node in the sensor space V can activate any number of nodes in the joint space U and therefore provide an easy solution to the problem of redundancy when using high DOFs.

An input sensor vector \( \vec{v} \) activates the V-space network while at the same time the joint position vector \( \vec{u} \)  activates the U-space network. The weight matrix W learns an associative relationship between these activations  through the hebbian learning rule. After training, the V-space network can be queried with a desired \( \vec{v} \) which will produce learned activity in the U-space network.

An input sensor vector \( \vec{v} \) activates the V-space network while at the same time the joint position vector \( \vec{u} \)  activates the U-space network. The weight matrix W learns an associative relationship between these activations  through the hebbian learning rule. After training, the V-space network can be queried with a desired \( \vec{v} \) which will produce learned activity in the U-space network.

To test the efficacy of this learning rule, two networks of size 50x50 each with \( \sigma \) values of 0.2, were trained using 10,000 sample positions of a 2 DOF manipulator with links of equal size 0.5 m.   

These results show that a hebbian learning rule can be used effectively to learn the kinematic relationship between sensor and motor variables without a prior model. Note that while activity generated in U-space is centered around the correct mapping, joint configurations that are close to the target V input are also activated at a decreasing rate in proportion to the distance.  Although 10,000 examples may seem costly, the kinematic map between spaces could be precomputed using a model of the kinematics and then updated online on the real robot resulting in learning that requires far fewer samples than if started from scratch.  

At this point, we have seen a SOM implementation that is able to learn the topology of the U and V spaces but unable to learn an accurate kinematics representation and a hebbian network that is able to learn accurate kinematics but relies on a fixed topology designed ahead of time by the engineer. In the next section, we will examine putting these two ideas together.


Fully Adaptive Kinematics

It is unreasonable to expect the kinematics of an autonomous robot, operating outside of a factory setting, to remain exactly the same as the day it shipped. With the hebbian connection rule from the previous section and the fixed size SOM we can obtain an adaptive learning structure that responds more flexibly to the environment. A first approach to this is shown below using the same 50x50 network as before but this time replacing the fixed grid in the U and V spaces with a self organizing map.

Is this a good solution? The answer no - at least without more modification. Some improvements that would be worth looking into beyond this would be replacing the hebbian connection rule with an ANN trained using back propagation. Additionally, the SOM might be replaced with a different vector quantification method that will form a more faithful, uniform representation that better captures the edges of the parameter space. An entirely separate question remains as to how to get useful movement out of the network although some prior work has looked at treating the U layer as a continuous attractor network (CAN). However, I believe this will provide an insightful first step into a more complete solution.