分类模型(又称分类器,或诊断)是将一个实例映射到一个特定类的过程。ROC分析的是二元分类模型,也就是输出结果只有两种类别的模型,例如:(阳性/阴性)(有病/没病)(垃圾邮件/非垃圾邮件)(敌军/非敌军)。
当信号侦测(或变量测量)的结果是一个连续值时,类与类的边界必须用一个阈值(英语:threshold)来界定。举例来说,用血压值来检测一个人是否有高血压,测出的血压值是连续的实数(从0~200都有可能),以收缩压140/舒张压90为阈值,阈值以上便诊断为有高血压,阈值未满者诊断为无高血压。二元分类模型的个案预测有四种结局:
混淆矩阵

Confusion matrix - Wikipedia
在机器学习领域和统计分类问题中,混淆矩阵(英语:confusion matrix)是可视化工具,特别用于监督学习,在无监督学习一般叫做匹配矩阵。
矩阵的每一列代表一个类的实例预测,而每一行表示一个实际的类的实例。
之所以如此命名,是因为通过这个矩阵可以方便地看出机器是否将两个不同的类混淆了(比如说把一个类错当成了另一个)。
混淆矩阵(也称误差矩阵 )是一种特殊的, 具有两个维度的(实际和预测)列联表(英语:contingency table),并且两维度中都有着一样的类别的集合。
如果已经训练好了一个系统用来区分猫和狗,那混淆矩阵就可以概括算法的测试结果以便将来的检查。假设一个13个动物的样本,8只猫和5只狗,那混淆矩阵的结果可能如下表所示

负负得正的若干预测,实际的预测效果实际并不好Youden’s J statistic - Wikipedia
Youden’s J statistic (also called Youden’s index) is a single statistic that captures the performance of a dichotomous diagnostic test. Informedness is its generalization to the multiclass case and estimates the probability of an informed decision.
dichotomous diagnostic test指的是二元诊断测试,即测试结果只有两种可能性的诊断测试。在医学领域中,常见的二元诊断测试包括阳性和阴性测试结果、患病和未患病的结果等。
二元诊断测试通常用于评估某种疾病或者症状的存在或者缺失,或者判断某种治疗方法是否有效等。常见的二元诊断测试方法包括血液检测、尿液检测、X光检查等。
二元诊断测试在机器学习领域中也有广泛的应用,如二元分类问题中的模型评估和选择最佳阈值等。
Youden’s J statistic is:
J=sensitivity+specificity−1{\displaystyle J={\text{sensitivity}}+{\text{specificity}}-1}J=sensitivity+specificity−1
J = 灵敏度 +特异度 − 1
sensitivity=true positivestrue positives+false negatives=TPTP+FNspecificity=true negativestrue negatives+false positives=TNTN+FP\text{sensitivity}={\frac {\text{true positives}}{{\text{true positives}}+{\text{false negatives}}}} =\frac{TP}{TP+FN} \\ \text{specificity}={\frac {\text{true negatives}}{{\text{true negatives}}+{\text{false positives}}}} =\frac{TN}{TN+FP} sensitivity=true positives+false negativestrue positives=TP+FNTPspecificity=true negatives+false positivestrue negatives=TN+FPTN
其中TP+FN表示实际阳性的数量;TN+FP表示实际阴性的数量
with the two right-hand quantities(等式右侧) being sensitivity and specificity. Thus the expanded formula is:
J=true positivestrue positives+false negatives+true negativestrue negatives+false positives−1{\displaystyle J={\frac {\text{true positives}}{{\text{true positives}}+{\text{false negatives}}}} +{\frac {\text{true negatives}}{{\text{true negatives}}+{\text{false positives}}}}-1} J=true positives+false negativestrue positives+true negatives+false positivestrue negatives−1
J∈[−1,1]J\in[-1,1]J∈[−1,1]
证明:
TP,NP,TN,FNTP⩽TP+FNTN⩽TN+FP0⩽TPTP+FN⩽10⩽TNTN+FP⩽1∴0⩽TPTP+FN+TNTN+FP⩽2−1⩽J⩽1TP,NP,TN,FN \\ TP\leqslant{TP+FN} \\ TN\leqslant{TN+FP} \\ {0}\leqslant\frac{TP}{TP+FN}\leqslant{1} \\ 0\leqslant{\frac{TN}{TN+FP}}\leqslant{1} \\ \therefore{0\leqslant{\frac{TP}{TP+FN}+\frac{TN}{TN+FP}}\leqslant{2}} \\ -1\leqslant{J}\leqslant{1} TP,NP,TN,FNTP⩽TP+FNTN⩽TN+FP0⩽TP+FNTP⩽10⩽TN+FPTN⩽1∴0⩽TP+FNTP+TN+FPTN⩽2−1⩽J⩽1
在Youden’s J统计量的计算公式中,sensitivity和specificity是指在二元分类问题中的两个重要指标,分别表示真实标签为正类的样本被正确地预测为正类的比例和真实标签为负类的样本被正确地预测为负类的比例。
在Youden’s J统计量的计算公式中,sensitivity和specificity被称作右侧的两个量,是指在公式中出现在右侧的两个量。
Youden’s J统计量(也称为Youden指数)是捕捉二元诊断测试性能的单个统计量。在二元分类问题中,Youden’s J统计量是真阳性率和真阴性率之差的绝对值的最大值。
在实际应用中,Youden’s J统计量可以作为评价模型性能的一个指标。
F-score - Wikipedia
In statistical analysis of binary classification, the F-score or F-measure is a measure of a test’s accuracy. It is calculated from the precision and recall of the test, where :
the precision is the number of true positive results divided by the number of all positive results, including those not identified correctly
the recall is the number of true positive results divided by the number of all samples that should have been identified as positive. Precision is also known as positive predictive value, and recall is also known as sensitivity in diagnostic binary classification.
Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances,
recall (also known as sensitivity) is the fraction of relevant instances that were retrieved. Both precision and recall are therefore based on relevance.
Consider a computer program for recognizing dogs (the relevant element) in a digital photograph. Upon processing a picture which contains ten cats and twelve dogs, the program identifies eight dogs. Of the eight elements identified as dogs, only five actually are dogs (true positives), while the other three are cats (false positives). Seven dogs were missed (false negatives), and seven cats were correctly excluded (true negatives). The program’s precision is then 5/8 (true positives / selected elements) while its recall is 5/12 (true positives / relevant elements).
The F1F_1F1 score is the harmonic mean of the precision and recall. It thus symmetrically represents both precision and recall in one metric. The more generic FβF_{\beta }Fβ score applies additional weights, valuing one of precision or recall more than the other.
The highest possible value of an F-score is 1.0, indicating perfect precision and recall, and the lowest possible value is 0, if either precision or recall are zero.
F值,亦被称做F-measure,是一种量测算法的精确度常用的指标,经常用来判断算法的精确度。目前在辨识、侦测相关的算法中经常会分别提到精确率(precision)和召回率(recall),F-score能同时考虑这两个数值,平衡地反映这个算法的精确度。
A more general F score, FβF_{\beta }Fβ, that uses a positive real factor β\betaβ, where β\betaβ is chosen such that recall is considered β\betaβ times as important as precision, is:
Fβ=(1+β2)⋅precision⋅recall(β2⋅precision)+recallF_\beta = (1 + \beta^2) \cdot \frac{\mathrm{precision} \cdot \mathrm{recall}}{(\beta^2 \cdot \mathrm{precision}) + \mathrm{recall}} Fβ=(1+β2)⋅(β2⋅precision)+recallprecision⋅recall
F1F_1F1-Score就是β=1\beta=1β=1时的FβF_\betaFβ一种特殊情况
In terms of Type I and type II errors this becomes:(用TP/FN/FP)表示
Two commonly used values for β\betaβ are 2, which weighs recall higher than precision, and 0.5, which weighs recall lower than precision.
调和平均数(英语:harmonic mean),在数学中,调和平均数是几种平均数之一,特别的,是毕达哥拉斯平均数之一。当需要计算平均速率时,调和平均数是一个合适的选择。In mathematics, the harmonic mean is one of several kinds of average, and in particular, one of the Pythagorean means. It is sometimes appropriate for situations when the average rate is desired.
调和平均数是将所有数值取倒数并求其算术平均数后,再将此算术平均数取倒数而得,其结果等于数值的个数除以数值倒数的总和。
一组正数x1,x2,⋯,xn{\displaystyle x_{1},x_{2},\cdots ,x_{n}}x1,x2,⋯,xn的调和平均数HHH其计算公式为: