AUCCalculator 0.2

A Java program for finding AUC-ROC and AUC-PR

AUCCalculator is a Java jar file for calculating the Area Under the Curve for both ROC graphs and Precision-Recall graphs. Input for the program will be in the form of a tab-delimited file in either ROC, PR or list form as described below. Output files will be in the same directory as the input file in the form of an .roc file and a .pr file, with one point for every original and interpolated point. Also, a .spr file will be generated, with precision points calculated at 100 recall points between 0 and 1. AUC-ROC and AUC-PR metrics will display on the console output.

Update Version 0.2 fixes a bug in the "list" entry option. Please download this new version if you use that option.

Usage

java -jar auc.jar     

  fileName - the file storing the data to be processed

  fileType -
	roc:  false positive rate  true positive rate
	pr:   recall  precision
	list: prob(example == true)  true classification (1 == positive, 0 = negative)

  posCount (absent with list fileType) - number of positive examples for the data set

  negCount (absent with list fileType) - number of negative examples for the data set

  minRecall (optional) - where to start calculating Area Under the PR curve.  i.e. if you sent 
            minRecall = 0.2, you will get the  AUC for recalls between 0.2 and 1. 
            The default is to find the AUC for all levels of recall. 

Examples

java -jar auc.jar testset.txt PR 20 2000 0.5

    will generate testset.txt.pr, testset.txt.roc and testset.txt.spr along with AUC-ROC and AUC-PR
   
java -jar auc.jar testsetlist.txt list

    will generate testsetlist.txt.pr, testsetlist.txt.roc and testsetlist.txt.spr along with AUC-ROC and AUC-PR
   

Example List file

0.9 1
0.8 1
0.7 0
0.6 1
0.55 1
0.54 1
0.53 0
0.52 0
0.51 1
0.505 0

Example ROC file for same dataset as above, using 6 pos and 4 neg examples

0       0
0.0     0.16666666666666666
0.0     0.3333333333333333
0.25    0.3333333333333333
0.25    0.5
0.25    0.6666666666666666
0.25    0.8333333333333334
0.5     0.8333333333333334
0.75    0.8333333333333334
0.75    1.0
1.0     1.0

Example PR file from same dataset as above, using 6 pos and 4 neg examples

0.16666666666666666     1.0
0.3333333333333333      1.0
0.3333333333333333      0.6666666666666666
0.5     0.75
0.6666666666666666      0.8
0.8333333333333334      0.8333333333333334
0.8333333333333334      0.7142857142857143
0.8333333333333334      0.625
1.0     0.6666666666666666
1.0     0.6

This code is provided as-is for academic purposes only. Please let us know if you find any bugs or have any questions at richm@cs.wisc.edu and jdavis@cs.wisc.edu. If you use this software, we request following paper be cited as a reference.