# Documentation

#### Data

The dataset can be found here.

The files are named following the convention: AAAA_BB_CC.txt

The name is unique for each observation (i.e. datapoint) and AAAA (0001 to 2097) is an index for the planet observed, BB (01 to 10) is an index for the stellar spot noise instance observed and CC (01 to 10) is an index for the gaussian photon noise instance observed.

The dataset consists of two types of files, ‘noisy’ files (containing the features) and ‘parameters’ files (containing the targets of the training examples).

## ‘Noisy’ files

‘Noisy’ files contain the observed data (i.e. the features), namely: 6 stellar and planet parameters and a 2D array of relative fluxes of dimension (55 x 300), where every row corresponds to a timeseries (with 300 time steps, denoted with t# below) of a particular wavelength channel (there are 55 channels, denoted with w# below).

The file structure can be seen below (without the column and row names):

#star_temp: 5196.0
#star_logg: 4.5
#star_mass: 0.91
#star_k_mag: 4.015
#period: 0.736539
(t1)            (t2)            ...   (t300)
(w1)  1.00010151742   1.00010218526   ...   1.00001215251
(w2)  0.999857792623  1.00009976297   ...   1.00007764626
(...) ...             ...             ...   ...
(w55) 0.999523150082  0.999468565171  ...   0.999934661757

‘Noisy’ files in the noisy_train directory should be used as features for training as the respective target parameters are given (see parameters files below).

‘Noisy’ files in the noisy_test directory should be used as features to predict and upload their respective target parameters (see upload file format below).

## ‘Parameters’ files

‘Parameters’ files contain the retrieved data (i.e. the targets), namely: 2 planet parameters (‘sma’ and ‘incl’, which can be used as intermediate targets or be ignored) and a 1D array of relative radii (planet-to-star-radius ratios) of dimension (1 x 55), where every column corresponds to a particular wavelength channel (there are 55 channels, denoted with w# below). The targets of the regression problem are the 55 relative radii.

The file structure can be seen below (without the column and row names):

#sma: 2314065295
#incl: 83.3
(w1)            (w2)            (...) (w55)
(AAAA_BB_CC)  0.0195608058653 0.019439812298  ...   0.0271040897872

‘Parameters’ files in the params_train directory should be used as targets for training, as they correspond to the noisy files in the noisy_train directory (see noisy files above).

Note: If you find it useful, you can use the two additional parameters that are provided ONLY for the training set examples – (semimajor axis) ‘sma’ and (inclination) ‘incl’ – as intermediate targets for predicting the actual 55 targets. Otherwise you can ignore them.

‘Parameters’ files for the test examples (i.e. the ground truth) will become available after the end of the competition.