Abstract
This paper pertains to an emerging machine learning paradigm: learning higher-order functions, i.e. functions whose inputs are functions themselves, particularly when these inputs are Neural Networks (NNs). With the growing interest in architectures that process NNs, a recurring design principle has permeated the field: adhering to the permutation symmetries arising from the connectionist structure of NNs. However, are these the sole symmetries present in NN parameterizations? Zooming into most practical activation functions (e.g. sine, ReLU, tanh) answers this question negatively and gives rise to intriguing new symmetries, which we collectively refer to as scaling symmetries, that is, non-zero scalar multiplications and divisions of weights and biases. In this work, we propose Scale Equivariant Graph MetaNetworks - ScaleGMNs, a framework that adapts the Graph Metanetwork (message-passing) paradigm by incorporating scaling symmetries and thus rendering neuron and edge representations equivariant to valid scalings. We introduce novel building blocks, of independent technical interest, that allow for equivariance or invariance with respect to individual scalar multipliers or their product and use them in all components of ScaleGMN. Furthermore, we prove that, under certain expressivity conditions, ScaleGMN can simulate the forward and backward pass of any input feedforward neural network. Experimental results demonstrate that our method advances the state-of-the-art performance for several datasets and activation functions, highlighting the power of scaling symmetries as an inductive bias for NN processing.
Overview
Directly operating on the parameters of neural networks, paves the way to a whole new spectrum of exciting applications. From analysis and intepretation of trained NNs to editing or even generating neural networks. Illustratively, we could apply domain adaptation to trained models by directly operating on their parameters or have models that outperform classic optimization algorithms in a learned optimization setting. Furthermore, with the advent of Implicit Neural Representations, trained NN parameters are increasingly used to represent datapoint signals, such as images or 3D shapes, replacing raw representations, i.e., pixel grids or point clouds. Consequently, many tasks involving such data, across various domains such as computer vision and physics, which are currently tackled using domain-specific architectures (e.g. CNNs for grids, PointNets or GNNs for point clouds and meshes), could potentially be solved by NNs that process the parameters of other NNs.
- Permutation symmetries: Hidden neurons do not possess any inherent ordering.
- Scaling symmetries: Activation functions have inherent symmetries bestowed to the NN.
In this work we propose a new type of metanetwork, called ScaleGMN, which accounts for both permutation and scaling symmetries. After mapping the input NN to a graph and applying a proper initialization method, we feed the result to a Graph Neural Network. Since GNN layers are by construction permutation equivariant, we only need to extend the MPNN paradigm to account for the scaling symmetries. For this scope we design scale equivariant MSG and UPD functions and a permutation and scale invariant READOUT function.
Paper
Bibtex
@article{kalogeropoulos2024scale, title={Scale Equivariant Graph Metanetworks}, author={Kalogeropoulos, Ioannis and Bouritsas, Giorgos and Panagakis, Yannis}, journal={arXiv preprint arXiv:2406.10685}, year={2024} }Copy