# Documentation

## Baseline solution

As a baseline solution, we trained a feedforward neural network on a sample of 5000 training examples selected uniformly at random. The neural network uses all 55 noisy light curves to predict the 55 relative radii directly. It does not use the stellar parameters, nor does it predict intermediate targets (inclination or semi-major axis) to do so.

## Data Preprocessing

The noisy light curves have undergone the following preprocessing steps:

i) Each lightcurve was smoothed using a moving median of window 3 (i.e. each value replaced by the median of itself and its two adjecent values). This was done to reduce outliers.

ii) Any value (relative flux) in any light curve that was above 1 was clipped to 1. This was done because the maximal relative flux during transit is 1.

iii) All values were normalized for the transit depths to lay roughly within the range [0,1]. The normalization was carried out per wavelength and was performed as follows:

First we computed the average transit depths per wavelength from the target values $\bar{\kappa}_{\lambda}=\frac{R_p^2}{R_s^2}$ on a sample of 10000 random training examples. For every wavelength $\lambda$, we then applied the transformation $x \leftarrow (x - (1-2\bar{\kappa}_{\lambda})) / 2\bar{\kappa}_{\lambda}$.

This was done to have the maximal relative flux values at exactly 1 and the transit depths around 0.

## Model & training hyperparaneters

We used a fully connected feedforward neural network of 5 2D-hidden layers, all of which consisted of 1024 units $\times$ 55 channels. After these, we added a flattening layer followed by a linear layer of 55 outputs. All activation functions were rectified linear units (relu). No batch normalization, regularization or dropout was applied.

The 5000 observations used were split into 4020 training and 980 validation examples in such a way that the two sets contained no planets in common. The model was trained by minimizing the average MSE across all wavelengths using the ADAM optimizer with a learning rate of $10^{-4}$ decaying with a rate of 0.01 and a batch size of 128. All remaining hyperparameters were set to default Keras values. The model was trained for a maximum number of 5 epochs without early stopping.