U Cache DSA - Search News

20d

IndexCache, a new sparse attention optimizer, delivers 1.82x faster inference on long-context AI models

Researchers at Tsinghua University and Z.ai built IndexCache to eliminate redundant computation in sparse attention models like DeepSeek and GLM. The training-free technique cuts 75% of indexer ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

IndexCache, a new sparse attention optimizer, delivers 1.82x faster inference on long-context AI models

Trending now