Eukaryote example
=================

Preamble - Creating a eukaryote individual from a random genome
---------------------------------------------------------------

In the prokaryote version of Aevol, one can start a simulation either from a pre-evolved individual
(a.k.a. Wild-Type), given its DNA sequence, or from a randomly generated (though bootstrapped) naive individual.
This second option is not yet available in the eukaryote version of Aevol, due to a founding effect
that prevents individuals from actually being diploid.

The current way to start a eukaryote simulation from a random genome is to pre-evolve prokaryotic individuals
with a halved phenotypic target, _i.e._ having divided by two the target activation level of all traits.
Once a stable genome structure is reached in these conditions (typically after 10,000,000 generations),
we manually create a eukaryote organism with 2 copies of the prokaryote organism's single chromosome.
While not fully satisfying, this process ensures that we obtain stable diploid organisms that evolve with
sexual reproduction and meiotic recombination.

Before confronting these newly diploid organisms to different evolutionary conditions that we want to compare,
it is recommended to let them evolve within the eukaryote framework for a while.
This ensures that they adapt to these specific conditions and reach a stable state,
thus that future differences we may observe are due to the later changes in evolutionary conditions
and not to the adaptation to the eukaryote conditions.


Running a eukaryote simulation from a Wild-Type organism
--------------------------------------------------------

### Note
For more information about what each command does, please check out the "basic" example.

### Step 1: Create an experiment
Go into the examples/eukaryote directory and `create` an experiment:
```sh
cd examples/eukaryote
aevol_eukaryote_2b_create param.in --fasta WT0.fa
```
This will create a checkpoint for generation 0 in the `checkpoint` directory.

A checkpoint consists of 3 files:
- a setup file containing the setup of your experiment
- a population file containing the sequences of all the individuals in the population in fasta format
- a checkpoint file containing internal states of Aevol objects

An additional file named `last_gener.txt` is also generated. It allows to track the latest generation that was computed.


### Step 2: Run an experiment
Once you have created an experiment, you can run it (let the population evolve in the specified setup).

The following command will run the first 1000 generations of evolution in several parallel threads (system default):
```sh
# run evolution from generation 0 to 1000 using <system-default> threads
aevol_eukaryote_2b_run --begin 0 --end 1000 --parallel -1
```

Note that you can set a specific number of threads for aevol to run in with option `--parallel` or drop that option
altogether for a single-thread execution.

In addition to a checkpoint being generated at generation 1000 (CHECKPOINT_STEP is set to 1000 in param.in),
stats are written in the `stats` directory:
- file `stats/stats_best.csv` contains stats about the best individual of its generation
- file `stats/stats_means.csv` contains the same stats as `stats/stats_best.csv` (with exceptions) but averaged over the population


Resume an experiment (from a checkpoint)
----------------------------------------
To resume a simulation from a checkpoint, simply set the `--begin` (or `-b`) option to the corresponding generation:
```sh
# run evolution from generation 1000 to 5000 using <system-default> threads
aevol_eukaryote_2b_run --begin 1000 --end 5000 --parallel -1
```

Note that when it produces a checkpoint, aevol writes the corresponding generation number in the file `last_gener.txt`.
The content of this file is in turn used as the generation at which to resume the simulation if the `--begin` (or `-b`) option is not present.
This means that, by default, `aevol_run` resumes the simulation where it stopped.


[Optional] Generating a Wild-Type
---------------------------------

If you want to generate your own Wild-Type instead of using the one we provide here, follow these steps:

### Step 1: Create a STANDARD experiment
Go into the examples/eukaryote/prokaryote_bootstrap directory and modify the param.in file to suit your needs.
Do keep in mind that the phenotypic target will need to be doubled in height
when switching from prokaryote to eukayote. Don't make it too high.
You can now `create` your experiment:
```sh
aevol_2b_create param.in
```

### Step 2: Run your STANDARD experiment
You can now run your experiment for a sufficiently long time.
Of course, before you do that you may want to validate the whole process with fewer generations.
```sh
aevol_2b_run --end 1100000 --parallel -1
```

### Step 3: Identify the common ancestor of the final population
You may have wondered why we run the simulation for 1,100,000 generations and not 1,000,000.
The 100,000 extra generations are done to enable us to extract the common ancestor of a final population.

To identify that individual, we first need to reconstruct the lineage of an organism in the final population.
This is done using the lineage post treatment:
```sh
aevol_2b_post_lineage -b 1000000
[...]
output written in lineage-best-b001000000-e001100000-i18.ae
log written in lineage-best-b001000000-e001100000-i18.log
```
This reconstructs the lineage of the final best individual and writes it in a lineage file
(here lineage-best-b001000000-e001100000-i18.ae) that contains all the information about that lineage.  
Here we only need the index of the organism of interest at generation 1,000,000.
Interestingly, the log file (here lineage-best-b001000000-e001100000-i18.log) contains
the indices of all the organisms in the lineage. Since we want the index of the first organism in the lineage,
we can obtain it with:
```sh
head -1 lineage-best-b001000000-e001100000-i18.log
1000000 809
```
Here the index of the ancestor at generation 1,000,000 is 809.

### Step 4: Create a eukaryote Wild-Type from the common ancestor of the final population

We have identified that the index of the organism of interest at generation 1,000,000 is 809.
The corresponding genome can be found in file checkpoints/population_001000000.fa

The following commands will create a file called `wt_eukaryote.fa` containing the generated Wild-Type.
Do not forget to modify the lineage filename
```sh
ORGANISM_ID=$(head -1 lineage-best-b001000000-e001100000-i18.log | cut -d' ' -f 2)
grep "\[indexes=.*$ORGANISM_ID.*\]" checkpoints/population_001000000.fa -A 1 > wt_prokaryote.fa
cat wt_prokaryote.fa > wt_eukaryote.fa
cat wt_prokaryote.fa >> wt_eukaryote.fa
```
