prompt压缩

2024-04-29

Word count: 385 | Reading time≈ 1 min

prompt压缩技术可以减少输入prompt的长度，同时保持各种任务的性能。

Selective Context

出自《Compressing Context to Enhance Inference Efficiency of Large Language Models》-EMNLP 2023。使用了一个很简单但有效的方法。

算法介绍

作者提出了self-information的概念，用来衡量了一段文本相对与一个language model所蕴含的信息。

self-information计算也极其简单。
$$
I(x) = -log(P(x))
$$
P指语言模型，但不需要是大模型本体，可以是小号的LLMs，例如1.3b的OPT和OpenAI-Curie(6.3b)。

prompt压缩的整体过程十分简单。

首先计算文本中每个短语的self-informaiton。
再根据想要的压缩比设置百分位数，得到阈值。
最后删去低于阈值的短语。

我们也可以直接上代码：

def _get_self_info_via_gpt2(self, text: str) -> Tuple[List[str], List[float]]:
        if self.lang == 'en':
            text = f"<|endoftext|>{text}"
        elif self.lang == 'zh':
            text = f"[CLS]{text}"
        with torch.no_grad():
            encoding = self.tokenizer(text, add_special_tokens=False, return_tensors='pt')
            encoding = encoding.to(self.device)
            outputs = self.model(**encoding)
            logits = outputs.logits
            probs = torch.softmax(logits, dim=-1)
            self_info = -torch.log(probs)
        

        input_ids = encoding['input_ids']
        input_ids_expaned = input_ids[:, 1:].unsqueeze(-1)
    
        tokens = [self.tokenizer.decode(token_) for token_ in input_ids.squeeze().tolist()[1:]]
        return tokens, self_info[:, :-1].gather(-1, input_ids_expaned).squeeze(-1).squeeze(0).tolist()

实验结果

使用了以下三个数据源作为测试数据：

http://ShareGPT.com：收集用户与ChatGPT对话记录的网站
Arxiv
BBC News

LLMLingua

LLMLingua是Huiqiang Jiang, Qianhui Wu, Xufang Luo等出自微软的研究者提出的一系列prompt压缩算法。

LLMLingua

占位

LongLLMLingua

zhanw

LLMLinguav2

暂无