when we want to evaluate a set of predicted labels or performance of ML models we use different performance measures. Accuracy, Precision, Recall, F-beta(usually people use F-1) or etc. But none of the aforementioned methods except Accuracy work for Multi-class data where class labels tend to have more than two (binary) different values. Why not Accuracy? Well, Accuracy is calculated as the portion of true labeled instances to total number of instances. The questions is what is wrong wit accuracy that we need …