Artificial Intelligence and Justified Confidence: Proceedings of a Workshop – in Brief | Artificial Intelligence and Justified Confidence: Proceedings of a Workshop–in Brief |The National Academies Pr

characterized the Army’s current AI efforts as technology driven and warfighter focused. Overall, she assessed that AI will play a critical role in the Army’s new operating concept to be prepared to fight anytime, anywhere, and achieve overmatch. AI, she noted, is integral to the overmatch goal of “avoiding a fair fight.”

Dr. Hwang delineated several overarching goals for the workshop: identify methods to improve the robustness of AI and ML tools in C2, as well as ways to foster soldier trust in the technology; study AI/ML vulnerabilities and limitations; and examine opportunities for materiel and non-materiel solutions to AI challenges.

ROBUST AND EQUITABLE UNCERTAINTY ESTIMATION

Aaron Roth, University of Pennsylvania, noted that while there currently exist many successful black box methods for making predictions, they are imperfect, and it can thus be desirable to predict ahead of time where these methods are likely to make mistakes. One way to achieve this, stated Dr. Roth, is by creating prediction sets. Prediction sets are sets of labels in which it is likely that a true label falls, and these are useful when exact point prediction is not possible. For example, given three grainy images of small rodents, it may not be clear if they are squirrels, weasels, or muskrats—but it can be confidently stated that they are not trucks. In addition to providing a reasonable range of answers, prediction sets predict uncertainty in two ways. First, the size of the prediction set itself quantifies a degree of uncertainty, and second, it indicates the location of uncertainty. Overall, the goal is that the prediction set contains the true label within a selected probability (e.g., 95 percent).

Dr. Roth characterized conformal prediction as a simple, elegant method to affix prediction sets to black box models. He stated that conformal prediction serves as an add-on to existing point-prediction models. Conformal prediction takes several steps. First, start with an arbitrary model that makes point predictions. Second, pick a nonconformity score. The nonconformity score evaluates a feature vector at a potential label. Large values of the nonconformity score indicate that a label is very different from what the model predicts, while small values demonstrate similarity to the model’s predictions.

Third, on a holdout set (a labeled data set with the same distribution as the model), label the nonconformity set at each point and identify a threshold value on the holdout set such that a specified percentage (e.g., 95 percent) of nonconformity scores fall below the value. After these steps, it is possible to compute the nonconformity score for any candidate label in a new set with unknown labels. The promise of conformal prediction is that there is a marginal guarantee (a probability statement that averages over the randomness of examples) (e.g., a 95 percent chance) that a prediction interval will contain the label on a new example.

Conformal prediction has shortcomings, including its marginal guarantees and assumptions about distributions, argued Dr. Roth. Marginal guarantees are averages over all data points—that is, “for 95 percent of people on which we make predictions, our prediction set contains their true label.” The issue is that the specific data point or subgroup may fall outside the confidence interval. For instance, a demographic group comprising less than 5 percent of a population might have zero percent coverage under the model. One potential way to mitigate this, noted Dr. Roth, is by separately calibrating for each group. Dr. Roth pointed out, however, that groups of interest often overlap. The goal, he asserted, is to give meaningful statements about data points that are in multiple relevant groups. Furthermore, for conformal prediction to work, new data must be drawn from the same distribution as past data—posing a problem for unanticipated distribution shifts in new data.

Dr. Roth stated that prediction set multivalidity is one way to create stronger than marginal guarantees. Prediction set mulitvalidity involves dividing the data into different groups that might intersect—in which a particular data point can be in multiple groups simultaneously. For any prediction, the goal is to have the true label in the prediction set 95 percent of the time—not merely overall, but conditional on membership in any pre-specified set of groups.

Dr. Roth presented an algorithm that can parameterize with an arbitrary collection of intersecting groups. The algorithm takes, as input, any sequence of models