Neural Networks: Types and Applications

Evaluation metrics

Evaluation metrics adopted within DL tasks play a crucial role in achieving the optimized classifier. They are utilized within a usual data classification procedure through two main stages: training and testing. It is utilized to optimize the classification algorithm during the training stage. This means that the evaluation metric is utilized to discriminate and select the optimized solution, e.g., as a discriminator, which can generate an extra-accurate forecast of upcoming evaluations related to a specific classifier. For the time being, the evaluation metric is utilized to measure the efficiency of the created classifier, e.g. as an evaluator, within the model testing stage using hidden data. As given in Eq. 20, TN and TP are defined as the number of negative and positive instances, respectively, which are successfully classified. In addition, FN and FP are defined as the number of misclassified positive and negative instances respectively. Next, some of the most well-known evaluation metrics are listed below.

  1. Accuracy: Calculates the ratio of correct predicted classes to the total number of samples evaluated (Eq. 20).

    Accuracy = \frac{TP+TN }{TP+TN+FP+FN} (20)

  2. Sensitivity or Recall: Utilized to calculate the fraction of positive patterns that are correctly classified (Eq. 21).

    Sensitivity=\frac{TP}{TP+FN } (21)

  3. Specificity: Utilized to calculate the fraction of negative patterns that are correctly classified (Eq. 22).

    Specificity =\frac{TN }{FP+TN } (22)

  4. Precision: Utilized to calculate the positive patterns that are correctly predicted by all predicted patterns in a positive class (Eq. 23).

    Precision=\frac{TP }{TP+FP} (23)

  5. F1-Score: Calculates the harmonic average between recall and precision rates (Eq. 24).

    F1_{score} = 2\times \frac{Precision\times Recall}{Precision+Recall}(24)

  6. J Score: This metric is also called Youdens J statistic. Eq. 25 represents the metric.

    J_{score} = Sensitivity + Specificity -1 (25)

  7. False Positive Rate (FPR): This metric refers to the possibility of a false alarm ratio as calculated in Eq. 26

    FPR = 1- Specificity (26)

  8. Area Under the ROC Curve: AUC is a common ranking type metric. It is utilized to conduct comparisons between learning algorithms, as well as to construct an optimal learning model. In contrast to probability and threshold metrics, the AUC value exposes the entire classifier ranking performance. The following formula is used to calculate the AUC value for two-class problem  (Eq. 27)

    AUC = \frac{S_{p}-n_{p} (n_{n}+1)/2}{n_{p}n_{n}} (27)

    Here, S_P represents the sum of all positive ranked samples. The number of negative and positive samples is denoted as n_n and n_p, respectively. Compared to the accuracy metrics, the AUC value was verified empirically and theoretically, making it very helpful for identifying an optimized solution and evaluating the classifier performance through classification training.

    When considering the discrimination and evaluation processes, the AUC performance was brilliant. However, for multiclass issues, the AUC computation is primarily cost-effective when discriminating a large number of created solutions. In addition, the time complexity for computing the AUC  O is O \left( |C|^{2} \; n\log n\right) with respect to the Hand and Till AUC model and O \left( |C| \; n\log n\right) according to Provost and Domingo's AUC model.