Convolutional Networks on Graphs for Learning Molecular Fingerprints: Nonlinear Function
Created: June 06, 2020
Modified: June 06, 2020

Convolutional Networks on Graphs for Learning Molecular Fingerprints

This page is from my personal notes, and has not been specifically reviewed for public consumption. It might be incomplete, wrong, outdated, or stupid. Caveat lector.
  • paper 2015
  • Existing systems for extracting features ('fingerprints') from molecules look a lot like convnets. They repeatedly apply a hash function (/convolution kernel) at each node, given features from the neighboring nodes. At each step each node marks a '1' in a global feature vector at the index given by its current hash.
    • Why hashing? The idea is that the feature vector represents the precise set of all substructures in this molecule. The hash at node i at level K corresponds to a 'unique' identification of the substructure in its K-neighborhood.
  • We can make this differentiable by replacing the hash with a neural net layer: a linear transformation followed by a (soft) threshold. In the limit of hard thresholding (/large weights) we recover a simple hash function. Instead of indexing, each node produces a vector whose softmax is added to the global vector. In the limit where softmax->argmax, this is just indexing.
  • I love that they compare the random-weight case directly to circular fingerprints. This validates that the neural approach is an equally expressive architecture, even without training.
  • Because the whole system is differentiable, we can a) train it by backprop, and b) compute the salience of any input datum to a prediction.
  • The test datasets involved predicting: solubility, in-vitro efficacy as an antimalarial drug, and photovoltaic activity. It works pretty well.
  • Limitations: computational complexity, unclear how much computation is needed at each layer, need N/2 layers to propagate information across a graph. 'A tree-structured network could examine the structure of the entire graph using only log(N)layers, but would require learning to parse molecules.' (maybe self-attention graph networks do something like this?)