CASA: Causality-driven Argument Sufficiency Assessment

Introduction

Argument Sufficiency Assessment is the task of determining if the premises of an argument support the conclusion sufficiently. Previous works train classifiers based on human annotations. However, the sufficiency criteria are vague and subjective among annotators.

An example of the argument sufficiency assessment task

To tackle the problem, We propose Logo CASA, a zero-shot Causality-driven Argument Sufficiency Assessment framework by formulating the task with Probability of Sufficiency (PS), a concept borrowed from causality:

PS quantifies the probability that introducing X would produce Y in the case where X and Y are in fact absent.

To measure PS of a given argument, there presents the following challenges:

How to measure the probabilities without observational data
- How to estimate P(Y = 1|X = 0, Y = 0) if we do not have the corresponding data points?
How to intervene in the argument
- How to estimate P(Y (X = 1) = 1) given data conforming to the conditions of X = 0 and Y = 0?

We make a hypothesis and verify it in this work: Given the commonsense knowledge and reasoning abilities of LLMs, can we use LLMs to sample data and simulate interventions?

The CASA Framework

Claim Extraction: extract the premise and conclusion from a given argument
Context Sampling: sample contexts that are consistent with ¬premise and ¬conclusion
Revision under Intervention: make interventions on the contexts to meet the premise
Probability Estimation: estimate the probability of the conclusion for each sampled situation

Cases

An example of the detailed reasoning process of Logo CASA (LLAMA2) on BIG-bench-LFD.

An example of the detailed reasoning process of Logo CASA (TULU) on Climate.

Results

We first compare Logo CASA with baseline methods on two logical fallacy detection datasets, BIG-bench-LFD and Climate. We find that CASA significantly outperforms all the corresponding zero-shot baselines with significance level α = 0.02, and also surpasses the one-shot baselines.

To examine whether LLMs work as we expect in each step of Logo CASA, we conduct step-wise human evaluation. We ask human annotators to rate three aspects individually: 1) In the claim extraction step, do LLMs extract the correct premises and conclusion from the argument? 2) In the context sampling step, are the contexts generated by LLMs consistent with ¬Premise and ¬Conclusion? 3) In the revision step, are the revised situations consistent with the Premise?

The accuracy of all aspects is above 90%, exhibiting that LLMs are capable of generating textual data that conform to certain conditions, and making interventions on situations in the form of natural language.

Application: Writing Assistance ✏️

We apply Logo CASA to a realistic scenario: providing writing suggestions for essays written by students. If CASA identifies that an argument in an essay is insufficient, we extract explainable reasons from CASA’s reasoning process, and provide them as suggestions for revision.

Specifically, we generate objection situations (situations that challenge the sufficiency of the argument) out of intervened situations R that contradict the Conclusion, by removing the Premise from R.

Question 1: Is CASA capable of generating rational and feasible objection situations to the essays?

Compared with directly prompting the base model, objection situations generated by CASA are more rational and feasible. The gap in feasibility is larger, as LLMs are likely to generate abstract objections when prompting, while CASA provides more practical objections which are easier to address.

Question 2: Will revising based on the generated objection situations improve the sufficiency of the essays?

In both methods tested, the Revised is Better proportion supersedes the Original is Better proportion, emphasizing an improvement in writing sufficiency. On the other hand, with the same base model, Logo CASA obtains a higher Revised - Original ratio (the Revised is Better proportion minus the Original is Better proportion) compared to the prompting method. This suggests that, even if we do not consider the difficulty of revision, CASA helps more in the revision process.

BibTeX


    @article{liu2024casa,
      title={CASA: Causality-driven Argument Sufficiency Assessment},
      author={Liu, Xiao and Feng, Yansong and Chang, Kai-Wei},
      journal={arXiv preprint arXiv:2401.05249},
      year={2024}
    }