README.txt

## Predict Entity Saliency

### Generate the features (package entity-salientness)

#### Enrich Dataset

For each entity candidate of a spot in the document generate all the features useful to understand its saliency [package]
There are 3 scripts:

* enrich-dataset.sh generates the features for only the spots relevant in the given documents ;
* enrich-dataset-no-title.sh generates the features for only the spots relevant in the given documents, do not consider the title field;
* enrich-dataset-with-dexter.sh generates the features for all the spots in the document, if a spot has not relevant entities in the candidates, all the entities will have -1 saliency.

the scripts expect in input a json-file containing a list of rated-documents (it.cnr.isti.hpc.salient.dataset.RatedDocument)
(convert-dexter-eval-to-rated-documents.sh will convert a dexter-eval json dataset to a list of rated documents, setting relevant entities' saliency to 1)

the scripts will generate a tsv file with a line for each entity, containing the entity, its spots and features

### Train a model for saliency (package predict-saliency)

1. convert the tsv file in a svm file (tsv-to-svm.sh)
2. split the svm in folds (set the dataset folder in config.sh and then generate-folds.sh)
3. train: train.sh training.svm/tsv model sampling[0,1] -> generate a model, label are casted two binary case (if < 0 and > 0), sampling perform the downsampling of the negative labels, (0.1 random sample the 10% of the ngative labels, 1 does not perform the down sampling).
4. test: test.sh test.sh test.svm/tsv model test the predictor, prints out accuracy and other stats.

#### Evaluate performance on entity linking: (package predict-saliency - dexter-eval)

1. predict-disambiguation-to-dexter.eval.sh file.svm model dexter-eval.tsv - predict the labels on the testset and the entity annotated using the labels, generate a results file with the disambiguated entities in dexter-eval.tsv
2. you will need to filter the dexter-eval.json assessment putting only the documents in the golden truth
3. finally in dexter eval run ./scripts/evaluate.sh dexter-eval.tsv golden.json Me ../saliency-predictor/conf-macro-measures.txt output.html to get the performance for the method.