Hakaze Cho / Yufeng Zhao / 趙羽風

@Beijing Inst. Tech. 2023

Ph.D. 3rd Year Student @ Graduate School of Information Science, Japan Advanced Institute of Science and Technology
Fully-funded Research Assistant @ RebelsNLU, PI: A. Prof. Naoya Inoue

Alias: Yufeng Zhao, both from the hieroglyph “趙羽風”
Birth: Beijing, 1999

E-mail: yfzhao [at] jaist.ac.jp
This email address is not always reachable. If I do not reply, please try CC the message to yfZhao495 [at] outlook.com.

Links: Twitter GitHub Google Scholar ORCID Researchmap Semantic Scholar CV
Physical Address: Laboratory I-52, Information Science Building I, 1-1 Asahidai, Nomi, Ishikawa, Japan

I graduated from Beijing Institute of Technology, a top-ranking university in China, with a Master’s degree in Software Engineering in 2023 and a Bachelor’s degree in Chemistry in 2021. I am in my graduation thread of Ph.D. at JAIST with a fast-track schedule in March 2026. My research focuses on exploring the internal mechanisms of artificial neural networks, particularly Transformer-based neural language models, during both training and inference by mathematical and representation learning methods, and improving their performance robustly through this deeper understanding. I have published over 30 papers / presentations in this area since 2023, some of which have been presented at top-tier international conferences such as ICLR and NeurIPS.

I am actively seeking productive research collaborations in the mentioned area. If you are interested in working together, please do not hesitate to contact me. I welcome collaborations with both experts and motivated beginners—being a novice is not a drawback if you are eager and efficient to learn. Additionally, I am open to exploring collaborations in other areas as well.

Japanese Site (日本語版)

Research Interests

Keywords: Representation Learning, Mechanistic Interpretability, In-context Learning

Interpretability for Artificial Neural Network: Mechanistic Interpretability, Low-resource Model Controlling
Large Languages Models: Mechanism of / Improving Transformer Large Language Models
Misc.: Manifold Learning, Low-precision Neural Networks, Neural Network Training Dynamics

Publications

Total Publications: 30, Cumulative Impact Factor: 96.4, Total Pages: 642.

International Conference

Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning
Haolin Yang, Hakaze Cho, Yiqiao Zhong, Naoya Inoue
Annual Conference on Neural Information Processing Systems (NeurIPS). 2025. 52 pages. [h5=371, IF=23.3]
[OpenReview] [PDF] [arXiv] [Poster] [Github] [Abstract] [Bibtex]

The unusual properties of in-context learning (ICL) have prompted investigations into the internal mechanisms of large language models. Prior work typically focuses on either special attention heads or task vectors at specific layers, but lacks a unified framework linking these components to the evolution of hidden states across layers that ultimately produce the model's output. In this paper, we propose such a framework for ICL in classification tasks by analyzing two geometric factors that govern performance: the separability and alignment of query hidden states. A fine-grained analysis of layer-wise dynamics reveals a striking two-stage mechanism: separability emerges in early layers, while alignment develops in later layers. Ablation studies further show that Previous Token Heads drive separability, while Induction Heads and task vectors enhance alignment. Our findings thus bridge the gap between attention heads and task vectors, offering a unified account of ICL's underlying mechanisms.

@inproceedings{
    yang2025unifying,
    title={Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning},
    author={Yang, Haolin and Cho, Hakaze and Zhong, Yiqiao and Inoue, Naoya},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
    year={2025},
    url={https://openreview.net/forum?id=FIfjDqjV0B}
}
Mechanistic Fine-tuning for In-context Learning
Hakaze Cho, Peng Luo, Mariko Kato, Rin Kaenbyou, Naoya Inoue
BlackboxNLP Workshop. 2025. 28 pages. Workshop at EMNLP 2025.
[ACL Anthology] [PDF] [arXiv] [Github] [Abstract] [Bibtex]

In-context Learning (ICL) utilizes structured demonstration-query inputs to induce few-shot learning on Language Models (LMs), which are not originally pre-trained on ICL-style data. To bridge the gap between ICL and pre-training, some approaches fine-tune LMs on large ICL-style datasets by an end-to-end paradigm with massive computational costs. To reduce such costs, in this paper, we propose Attention Behavior Fine-Tuning (ABFT), utilizing the previous findings on the inner mechanism of ICL, building training objectives on the attention scores instead of the final outputs, to force the attention scores to focus on the correct label tokens presented in the context and mitigate attention scores from the wrong label tokens. Our experiments on 9 modern LMs and 8 datasets empirically find that ABFT outperforms in performance, robustness, unbiasedness, and efficiency, with only around 0.01% data cost compared to the previous methods. Moreover, our subsequent analysis finds that the end-to-end training objective contains the ABFT objective, suggesting the implicit bias of ICL-style data to the emergence of induction heads. Our work demonstrates the possibility of controlling specific module sequences within LMs to improve their behavior, opening up the future application of mechanistic interpretability.

@inproceedings{cho2025mechanistic,
    title={Mechanistic Fine-tuning for In-context Learning},
    author={Cho, Hakaze and Luo, Peng and Kato, Mariko and Kaenbyou, Rin and Inoue, Naoya},
    booktitle = "Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP",
    year={2025},
    url={https://arxiv.org/abs/2505.14233}
}
Revisiting In-context Learning Inference Circuit in Large Language Models
Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue
International Conference on Learning Representations (ICLR). 2025. 37 pages. [h5=362, IF=48.9]
[OpenReview] [PDF] [arXiv] [Github] [Poster] [Review] [Abstract] [Bibtex]

In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Input Text Encode: LMs encode every input text (in the demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint representations of demonstrations similar to the query representation on a task subspace, and copy the searched representations into the query. Then, language model heads capture these copied label representations to a certain extent and decode them into predicted labels. Through careful measurements, the proposed inference circuit successfully captures and unifies many fragmented phenomena observed during the ICL process, making it a comprehensive and practical explanation of the ICL inference process. Moreover, ablation analysis by disabling the proposed steps seriously damages the ICL performance, suggesting the proposed inference circuit is a dominating mechanism. Additionally, we confirm and list some bypass mechanisms that solve ICL tasks in parallel with the proposed circuit.

@inproceedings{cho2025revisiting,
    title={Revisiting In-context Learning Inference Circuit in Large Language Models},
    author={Hakaze Cho and Mariko Kato and Yoshihiro Sakai and Naoya Inoue},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=xizpnYNvQq}
}
Token-based Decision Criteria Are Suboptimal in In-context Learning
Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue
Annual Conference of the Nations of the Americas Chapter of the ACL (NAACL.main). 2025. 24 pages. [h5=126, IF=16.5]
[ACL Anthology] [PDF] [arXiv] [Github] [Poster] [Review] [Abstract] [Bibtex]

In-Context Learning (ICL) typically utilizes classification criteria from output probabilities of manually selected label tokens. However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation applied. To address this problem, we propose Hidden Calibration, which renounces token probabilities and uses the nearest centroid classifier on the LM’s last hidden states. In detail, we assign the label of the nearest centroid previously estimated from a calibration set to the test sample as the predicted label. Our experiments on 6 models and 10 classification datasets indicate that Hidden Calibration consistently outperforms current token-based baselines by about 20%~50%, achieving a strong state-of-the-art in ICL. Our further analysis demonstrates that Hidden Calibration finds better classification criteria with less inter-class overlap, and LMs provide linearly separable intra-class clusters with the help of demonstrations, which supports Hidden Calibration and gives new insights into the principle of ICL. Our official code implementation can be found at https://github.com/hc495/Hidden_Calibration.

@inproceedings{cho2025token,
    title={Token-based Decision Criteria Are Suboptimal in In-context Learning},
    author={Hakaze Cho and Yoshihiro Sakai and Mariko Kato and Kenshiro Tanaka and Akira Ishii and Naoya Inoue},
    booktitle={Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)},
    year={2025},
    url={https://aclanthology.org/2025.naacl-long.278/}
}
Understanding Token Probability Encoding in Output Embeddings
Hakaze Cho, Yoshihiro Sakai, Kenshiro Tanaka, Mariko Kato, Naoya Inoue
International Conference on Computational Linguistics (COLING). 2025. 16 pages. [h5=81, IF=7.7]
[ACL Anthology] [PDF] [arXiv] [Poster] [Abstract] [Bibtex]

In this paper, we investigate the output token probability information in the output embedding of language models. We find an approximate common log-linear encoding of output token probabilities within the output embedding vectors and empirically demonstrate that it is accurate and sparse. As a causality examination, we steer the encoding in output embedding to modify the output probability distribution accurately. Moreover, the sparsity we find in output probability encoding suggests that a large number of dimensions in the output embedding do not contribute to causal language modeling. Therefore, we attempt to delete the output-unrelated dimensions and find more than 30% of the dimensions can be deleted without significant movement in output distribution and sequence generation. Additionally, in the pre-training dynamics of language models, we find that the output embeddings capture the corpus token frequency information in early steps, even before an obvious convergence of parameters starts.

@inproceedings{cho2025understanding,
    title={Understanding Token Probability Encoding in Output Embeddings},
    author={Hakaze Cho and Yoshihiro Sakai and Kenshiro Tanaka and Mariko Kato and Naoya Inoue},
    booktitle={Proceedings of the 31st International Conference on Computational Linguistics},
    year={2025},
    url={https://aclanthology.org/2025.coling-main.708/}
}
Find-the-Common: A Benchmark for Explaining Visual Patterns from Images
Yuting Shi, Naoya Inoue, Houjing Wei, Yufeng Zhao, Tao Jin
International Conference on Language Resources and Evaluation (LREC). 2024. 7 pages. [h5=68]
[ACL Anthology] [PDF] [Abstract] [Bibtex]

Recent advances in Instruction-fine-tuned Vision and Language Models (IVLMs), such as GPT-4V and InstructBLIP, have prompted some studies have started an in-depth analysis of the reasoning capabilities of IVLMs. However, Inductive Visual Reasoning, a vital skill for text-image understanding, remains underexplored due to the absence of benchmarks. In this paper, we introduce Find-the-Common (FTC): a new vision and language task for Inductive Visual Reasoning. In this task, models are required to identify an answer that explains the common attributes across visual scenes. We create a new dataset for the FTC and assess the performance of several contemporary approaches including Image-Based Reasoning, Text-Based Reasoning, and Image-Text-Based Reasoning with various models. Extensive experiments show that even state-of-the-art models like GPT-4V can only archive with 48% accuracy on the FTC, for which, the FTC is a new challenge for the visual reasoning research community. Our dataset has been released and is available online: https://github.com/SSSSSeki/Find-the-common.

@inproceedings{shi2024find,
    title={Find-the-Common: A Benchmark for Explaining Visual Patterns from Images},
    author={Yuting Shi and Naoya Inoue and Houjing Wei and Yufeng Zhao and Tao Jin},
    booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    year={2024},
    url={https://aclanthology.org/2024.lrec-main.642/}
}

Pre-print

Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis
Haolin Yang, Hakaze Cho, Naoya Inoue
Pre-print. 2025. 45 pages.
[PDF] [arXiv] [Abstract] [Bibtex]

We investigate the mechanistic underpinnings of in-context learning (ICL) in large language models by reconciling two dominant perspectives: the component-level analysis of attention heads and the holistic decomposition of ICL into Task Recognition (TR) and Task Learning (TL). We propose a novel framework based on Task Subspace Logit Attribution (TSLA) to identify attention heads specialized in TR and TL, and demonstrate their distinct yet complementary roles. Through correlation analysis, ablation studies, and input perturbations, we show that the identified TR and TL heads independently and effectively capture the TR and TL components of ICL. Using steering experiments with geometric analysis of hidden states, we reveal that TR heads promote task recognition by aligning hidden states with the task subspace, while TL heads rotate hidden states within the subspace toward the correct label to facilitate prediction. We further show how previous findings on ICL mechanisms, including induction heads and task vectors, can be reconciled with our attention-head-level analysis of the TR-TL decomposition. Our framework thus provides a unified and interpretable account of how large language models execute ICL across diverse tasks and settings.

@article{yang2025localizingtaskrecognitiontask,
    title={Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis},
    author={Yang, Haolin and Cho, Hakaze and Inoue, Naoya},
    journal={arXiv preprint arXiv:2509.24164},
    year={2025}
}
Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight
Haolin Yang, Hakaze Cho, Kaize Ding, Naoya Inoue
Pre-print. 2025. 48 pages.
[PDF] [arXiv] [Abstract] [Bibtex]

Large Language Models (LLMs) can perform new tasks from in-context demonstrations, a phenomenon known as in-context learning (ICL). Recent work suggests that these demonstrations are compressed into task vectors (TVs), compact task representations that LLMs exploit for predictions. However, prior studies typically extract TVs from model outputs or hidden states using cumbersome and opaque methods, and they rarely elucidate the mechanisms by which TVs influence computation. In this work, we address both limitations. First, we propose directly training Learned Task Vectors (LTVs), which surpass extracted TVs in accuracy and exhibit superior flexibility-acting effectively at arbitrary layers, positions, and even with ICL prompts. Second, through systematic analysis, we investigate the mechanistic role of TVs, showing that at the low level they steer predictions primarily through attention-head OV circuits, with a small subset of 'key heads' most decisive. At a higher level, we find that despite Transformer nonlinearities, TV propagation is largely linear: early TVs are rotated toward task-relevant subspaces to improve logits of relevant labels, while later TVs are predominantly scaled in magnitude. Taken together, LTVs not only provide a practical approach for obtaining effective TVs but also offer a principled lens into the mechanistic foundations of ICL.

@article{yang2025taskvectorslearnedextracted,
    title={Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight},
    author={Yang, Haolin and Cho, Hakaze and Ding, Kaize and Inoue, Naoya},
    journal={arXiv preprint arXiv:2509.24169},
    year={2025}
}
Binary Autoencoder for Mechanistic Interpretability of Large Language Models
Hakaze Cho, Haolin Yang, Brian M. Kurkoski, Naoya Inoue
Pre-print. 2025. 36 pages.
[PDF] [arXiv] [Abstract] [Bibtex]

Existing works are dedicated to untangling atomized numerical components (features) from the hidden states of Large Language Models (LLMs) for interpreting their mechanism. However, they typically rely on autoencoders constrained by some implicit training-time regularization on single training instances (i.e., normalization, top-k function, etc.), without an explicit guarantee of global sparsity among instances, causing a large amount of dense (simultaneously inactive) features, harming the feature sparsity and atomization. In this paper, we propose a novel autoencoder variant that enforces minimal entropy on minibatches of hidden activations, thereby promoting feature independence and sparsity across instances. For efficient entropy calculation, we discretize the hidden activations to 1-bit via a step function and apply gradient estimation to enable backpropagation, so that we term it as Binary Autoencoder (BAE) and empirically demonstrate two major applications: (1) Feature set entropy calculation. Entropy can be reliably estimated on binary hidden activations, which we empirically evaluate and leverage to characterize the inference dynamics of LLMs and In-context Learning. (2) Feature untangling. Similar to typical methods, BAE can extract atomized features from LLM's hidden states. To robustly evaluate such feature extraction capability, we refine traditional feature-interpretation methods to avoid unreliable handling of numerical tokens, and show that BAE avoids dense features while producing the largest number of interpretable ones among baselines, which confirms the effectiveness of BAE serving as a feature extractor.

@article{cho2025binary,
    title={Binary Autoencoder for Mechanistic Interpretability of Large Language Models},
    author={Cho, Hakaze and Yang, Haolin and Kurkoski, Brian M. and Inoue, Naoya},
    journal={arXiv preprint arXiv:2509.20997},
    year={2025}
}
Mechanism of Task-oriented Information Removal in In-context Learning
Hakaze Cho, Haolin Yang, Gouki Minegishi, Naoya Inoue
Pre-print. 2025. 87 pages.
[PDF] [arXiv] [Abstract] [Bibtex]

In-context Learning (ICL) is an emerging few-shot learning paradigm based on modern Language Models (LMs), yet its inner mechanism remains unclear. In this paper, we investigate the mechanism through a novel perspective of information removal. Specifically, we demonstrate that in the zero-shot scenario, LMs encode queries into non-selective representations in hidden states containing information for all possible tasks, leading to arbitrary outputs without focusing on the intended task, resulting in near-zero accuracy. Meanwhile, we find that selectively removing specific information from hidden states by a low-rank filter effectively steers LMs toward the intended task. Building on these findings, by measuring the hidden states on carefully designed metrics, we observe that few-shot ICL effectively simulates such task-oriented information removal processes, selectively removing the redundant information from entangled non-selective representations, and improving the output based on the demonstrations, which constitutes a key mechanism underlying ICL. Moreover, we identify essential attention heads inducing the removal operation, termed Denoising Heads, which enables the ablation experiments blocking the information removal operation from the inference, where the ICL accuracy significantly degrades, especially when the correct label is absent from the few-shot demonstrations, confirming both the critical role of the information removal mechanism and denoising heads.

@article{cho2025mechanism,
    title={Mechanism of Task-oriented Information Removal in In-context Learning},
    author={Cho, Hakaze and Yang, Haolin and Minegishi, Gouki and Inoue, Naoya},
    journal={arXiv preprint arXiv:2509.21012},
    year={2025}
}
Measuring Intrinsic Dimension of Token Embeddings
Takuya Kataiwa, Hakaze Cho, Tetsushi Ohki
Pre-print. 2025. 6 pages.
[PDF] [arXiv] [Abstract] [Bibtex]

In this study, we measure the Intrinsic Dimension (ID) of token embedding to estimate the intrinsic dimensions of the manifolds spanned by the representations, so as to evaluate their redundancy quantitatively compared to their extrinsic dimensionality. In detail, (1) we estimate the ID of token embeddings in small-scale language models and also modern large language models, finding that the embedding spaces often reside on lower-dimensional manifolds compared to their extrinsic dimensionality; (2) we measure the ID across various model sizes and observe an increase in redundancy rates as the model scale grows; (3) we measure the dynamics of IDs during the training process, and find a rapid ID drop in the early stages of training. Moreover, (4) when LoRA is applied to the embedding layers, we observe a sudden drop in perplexity around the estimated IDs, suggesting that the ID can serve as a useful guideline for LoRA application.

@article{kataiwa2025measuring,
    title={Measuring Intrinsic Dimension of Token Embeddings},
    author={Kataiwa, Takuya and Cho, Hakaze and Ohki, Tetsushi},
    journal={arXiv preprint arXiv:2503.02142},
    year={2025}
}
Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations
Mariko Kato, Hakaze Cho, Yoshihiro Sakai, Naoya Inoue
Pre-print. 2025. 8 pages.
[PDF] [arXiv] [Abstract] [Bibtex]

The performance of In-Context Learning (ICL) is highly sensitive to the selected demonstrations. Existing approaches to demonstration selection optimize different objectives, yielding inconsistent results. To address this, we propose a unified metric--affinity and diversity--that leverages ICL model's internal representations. Our experiments show that both affinity and diversity strongly correlate with test accuracies, indicating their effectiveness for demonstration selection. Moreover, we show that our proposed metrics align well with various previous works to unify the inconsistency.

@article{kato2025affinity,
    title={Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations},
    author={Kato, Mariko and Cho, Hakaze and Sakai, Yoshihiro and Inoue, Naoya},
    journal={arXiv preprint arXiv:2502.14380},
    year={2025}
}
StaICC: Standardized Evaluation for Classification Task in In-context Learning
Hakaze Cho, Naoya Inoue
Pre-print. 2025. 20 pages.
[PDF] [arXiv] [Github] [PyPI] [Abstract] [Bibtex]

Classification tasks are widely investigated in the In-Context Learning (ICL) paradigm. However, current efforts are evaluated on disjoint benchmarks and settings, while their performances are significantly influenced by some trivial variables, such as prompt templates, data sampling, instructions, etc., which leads to significant inconsistencies in the results reported across various literature, preventing fair comparison or meta-analysis across different papers. Therefore, this paper proposes a standardized and easy-to-use evaluation toolkit (StaICC) for in-context classification. Including, for the normal classification task, we provide StaICC-Normal, selecting 10 widely used datasets, and generating prompts with a fixed form, to mitigate the variance among the experiment implementations. To enrich the usage of our benchmark, we also provide a sub-benchmark StaICC-Diag for diagnosing ICL from several aspects, aiming for a more robust inference processing.

@article{cho2025staicc,
    title={StaICC: Standardized Evaluation for Classification Task in In-context Learning},
    author={Cho, Hakaze and Inoue, Naoya},
    journal={arXiv preprint arXiv:2501.15708},
    year={2025}
}
NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning
Yufeng Zhao, Yoshihiro Sakai, Naoya Inoue
Pre-print. 2024. 20 pages.
[PDF] [arXiv] [Github] [Abstract] [Bibtex]

In-Context Learning (ICL) is suffering from unsatisfactory performance and under-calibration due to high prior bias and unfaithful confidence. Some previous works fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for better performance and calibration. Our experiments on two models and 12 downstream datasets show that NoisyICL can help ICL produce more accurate predictions. Our further analysis indicates that NoisyICL enables the model to provide more fair predictions, and also with more faithful confidence. Therefore, we believe that NoisyICL is an effective calibration of ICL. Our experimental code is uploaded to Github.

@article{zhao2024noisyicl,
    title={NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning},
    author={Zhao, Yufeng and Sakai, Yoshihiro and Inoue, Naoya},
    journal={arXiv preprint arXiv:2402.05515},
    year={2024}
}

Domestic Conferences / Journal / Miscellaneous
(† = Japan-domestic Secondary Publication for International Conference Papers; Default: Non-refereed, ▲= Refereed)

†Conference Note: Token-based Decision Criteria Are Suboptimal in In-context Learning
Hakaze Cho, Naoya Inoue
Journal of Natural Language Processing (JNLP). 2025. 6 pages.
[PDF]
▲†Measuring Intrinsic Dimension of Token Embeddings
Takuya Kataiwa, Hakaze Cho, Tetsushi Ohki
Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages.
[PDF] [Abstract]

本稿では，言語の埋め込み表現である単語ベクトルや埋め込み層について，表現に必要十分な次元である内在次元 (Intrinsic Dimension; ID) を計測し，その冗長度合いを定量評価する．具体的には，(1) Word2Vec や GloVe などの小規模モデルの埋め込みが持つIDを推定し，(2) Pythiaシリーズを代表とする大規模言語モデルの埋め込み層における ID をスケール別・学習過程別に解析する．実験の結果，埋め込み空間が外在次元に比べ低い次元の多様体上に分布する傾向が見られた．また，モデル規模の拡大に伴う冗長率の変化や，学習初期において ID が急速に収束する傾向が観察された．また，推定されたIDがLoRA適用時のランク選択に有効な可能性を示した．
▲Analysis of Internal Representations of Knowledge with Expressions of Familiarity
Kenshiro Tanaka, Yoshihiro Sakai, Hakaze Cho, Naoya Inoue, Kai Sato, Ryosuke Takahashi, Benjamin Heinzerling, Kentaro Inui
Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages.
[PDF] [Abstract]

大規模言語モデル (LLM) の知識の既知性判断能力に関する研究が進められつつあるが、「It is known that…」のような既知性を示す言語表現を伴う知識を学習した際に、推論時にLLMがその知識の既知性を判断する能力については、検討されていない。本研究では、事前学習済みLLMに既知性を示す言語表現を付与した知識の記述を学習させ、その知識の内部表象を分析することで、既知性がどのようにLLMの内部に表現され得るのかを分析する。その結果、（1）知識の内部表象には、学習時に付与した言語表現毎に個別に既知性の情報が保持されていること、（2）既知性の情報は言語表現の記述位置毎に個別に保持されることが明らかになった。本研究は、LLMの既知性の判断能力のメカニズム解明の足がかりとなるものである。
▲Internal Representations of Knowledge Recognition in Language Models
Kai Sato, Ryosuke Takahashi, Benjamin Heinzerling, Kenshiro Tanaka, Hakaze Cho, Yoshihiro Sakai, Naoya Inoue, Kentaro Inui
Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages.
[PDF] [Abstract]

言語モデル（LM）の知識獲得能力は広く研究されているが，獲得した知識の既知性に関する判断機序については十分な理解が得られていない．本研究ではLMを用いて，特定の知識に対する出力生成時と既知性判断時の内部状態を比較分析した．結果として，言語モデルが実際に既知性判断を行う能力を持ち得ることが示され，（1）知識を学習した時点で，既知性を判断するための情報が内部表現中に存在すること，（2）既知と判断される知識と未知と判断される知識において，LMがそれぞれ異なる活性化パターンを示すことを明らかにした．これらの知見は，LMの既知性判断メカニズムの理解へ向けた手がかりを提供する．
†Revisiting In-context Learning Inference Circuit in Large Language Models
Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue
Annual Conference of the Association for Natural Language Processing (NLP). 2025. 6 pages. Oral, Outstanding Paper.
[PDF] [Slides] [Abstract]

In-context Learning (ICL) は，言語モデルにおける新たな少数ショット学習パラダイムとして注目されているが，その内在的メカニズムは十分に解明されていない. 本研究では，ICL の推論ダイナミクスを3 つの基本操作に分解し，それらを基盤として推論回路を構築した上で精密な測定を行い，従来の研究で観察されてきた現象を統一的に説明することを試みた. さらに，提案した回路を無効化するアブレーション分析の結果，ICL の性能が顕著に低下することが確認され，提案した推論回路が ICL の主要なメカニズムであることが示唆された.
Beyond the Induction Circuit: A Mechanistic Prototype for Out-of-domain In-context Learning
Hakaze Cho, Naoya Inoue
Annual Conference of the Association for Natural Language Processing (NLP). 2025. 5 pages.
[PDF] [Poster] [Abstract]

In-contextLearning (ICL) is a promising few-shot learning paradigm with unclear mechanisms. Existing explanations heavily rely on Induction Heads, which fail to account for out-of-domain ICL, where query labels are absent in demonstrations. To address this, we model ICL as attribute resolution, where queries are mixtures of some attributes, and ICL identifies and resolves relevant attributes for predictions. In this paper, we propose a mechanistic prototype using toy models trained on synthetic data, and observe: (1) even 1-layer Transformers achieve non-trivial accuracy, with limited benefit from additional demonstrations, (2) scaling models effectively improve accuracy, and (3) inference operations can be decomposed into label space identification and generalized induction, warranting further exploration.
†Measuring Intrinsic Dimension of Token Embeddings
Takuya Kataiwa, Hakaze Cho, Tetsushi Ohki
Annual Conference of the Association for Natural Language Processing (NLP). 2025. 5 pages.
[PDF] [Abstract]

本研究では，言語の埋め込み表現である単語ベクトルや埋め込み層について，表現に必要十分な次元である内在次元 (Intrinsic Dimension; ID) を計測し，その冗長度合いを定量評価する．具体的には，(1)Word2Vec や GloVe などの小規模モデルの埋め込みが持つ ID を推定し，(2) Pythia 系列を代表とする大規模言語モデルの埋め込み層における ID をスケール別・学習過程別に解析する．実験の結果，埋め込み空間が外在的な次元に比べ低い次元の多様体上に分布する傾向が見られた．また，モデル規模の拡大に伴う冗長率の変化や，学習初期における急激な IDの形成が見られた．
†Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations
Mariko Kato, Hakaze Cho, Yoshihiro Sakai, Naoya Inoue
Annual Conference of the Association for Natural Language Processing (NLP). 2025. 6 pages.
[PDF] [Abstract]

文脈内学習 (In-Context Learning; ICL) において, デモンストレーション (デモ) の選択はタスク性能に大きな影響を与える. 既存研究ではデモの選択手順については研究されているが, 選択基準であるデモの性質は十分に調べられていない. 本研究では, デモの「親和性」と「多様性」という 2 つの性質を新たに提案し, その内の親和性が性質が複数のモデルおよびデータセットにおいてデモ選択に望ましい性質であることを示した. さらに, 既存手法で選ばれたデモが, 2 つの性質のタスク性能を向上させる方向へ集約していることを示し, デモ選択とタスク性能のメカニズム解明への示唆を得た.
†StaICC: Standardized Evaluation for Classification Task in In-context Learning
Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Naoya Inoue
Symposium of Young Researcher Association for NLP Studies (YANS). 2025. Poster Only.
[Poster]
Image Feature Vectors are Frozen Informative Tokens for Language Models
Mariko Kato, Hakaze Cho, Zhenzhu Yan, Yuting Shi, Naoya Inoue
Symposium of Young Researcher Association for NLP Studies (YANS). 2025. Poster Only.
†Token-based Decision Criteria Are Suboptimal in In-context Learning
Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue
The 260th SIG for Natural Language, Information Processing Society of Japan (SIG-NL260, IPSJ). 2024. 17 pages. Oral, Research Award for Young Scholars.
[PDF] [Slides] [Abstract]

文脈内学習 (In-Context Learning; ICL) のタスクでは通常，ラベル空間に含まれるラベルトークンの生成確率を比べることで推論結果を決定するが，そのラベルトークンの選択は人間により恣意的に行われる．いくつかの先行研究は，これらのラベルトークンの生成確率の較正が ICL の性能向上に寄与することを明らかにしたが，これらの手法には依然として，人間が最適ではないラベルトークンを選べてしまうという問題が残る．そこで，本研究ではまず (1) LLM の隠れ状態を分析することで，現行のトークンベースの較正手法では，隠れ状態が持つ有益な情報をうまく表現出来ないことを明らかにする．そして，(2) 人間によるラベルトークン選択の影響を低減し，隠れ状態に含まれる有益な情報を効果的に利用出来る新たな ICL の手法を提案する．実験の結果，我々の提案手法は 3 つのモデルと 10 個の分類データセットでの実験で，現在のトークンベースの較正手法を約 20% 上回る性能を発揮した．
†NoisyICL: A Little Noise in Model Parameters Can Calibrate In-context Learning
Yufeng Zhao, Yoshihiro Sakai, Naoya Inoue
Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages. Oral.
[PDF] [Slides] [Abstract]

In-Context Learning (ICL), where language models learn tasks in a generative form from few-shot demonstrations without parameter update, is emerging while scaling up the language models. Nevertheless, the performance of ICL is still unsatisfactory. Some previous studies suggested that it is due to under-calibration and they fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for a calibration. Our experiments on 2 models and 7 downstream task datasets show that NoisyICL helps perform ICL better. Our further analysis indicates that NoisyICL can enable the model to provide more fair predictions, with less unfaithful confidence. So, NoisyICL can be considered as an effective calibration.
Can LLM Learn Prompt Format in In-context Learning?
Yoshihiro Sakai, Hakaze Cho, Naoya Inoue
Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages. SB Intuitions Awards.
[PDF] [Abstract]

In-Context Learning (文脈内学習；ICL) は，プロンプト中に与えられた少数のデモなどからパラメータを更新することなくタスクを学習する LLM の能力であるが，そのメカニズムは十分に明らかにされていない．先行研究の実験は，「タスクの入力の後にラベルを出力する」というフォーマットを LLM に示すことが特に重要である可能性を示唆する．そこで本研究では，LLM が与えられたデモから答え方のフォーマットを学習する様子を直接的に可視化した．結果として，(1) 確かに LLM はデモから答え方のフォーマットを学んでいること，(2) フォーマットの学習は意味の無いラベルについても可能であること，(3) 最悪のラベルが ICL の Macro-F1 を大きく向上させることを発見した．
†Find-the-Common: Benchmarking Inductive Reasoning Ability on Vision-Language Models
Yuting Shi, Naoya Inoue, Houjing Wei, Yufeng Zhao, Tao Jin
Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages.
[PDF] [Abstract]

Recent advances in Instruction-fine-tuned Vision and Language Models (IVLMs) have revolutionized the landscape of integrated vision and language understanding. However, Inductive Visual Reasoning—a vital skill for textimage understanding—remains underexplored due to the absence of benchmarks. So, in this paper, we introduce Find–the–Common (FTC): a new vision and language task for Inductive Visual Reasoning. In this task, models are required to identify an answer that explains the common attributes across visual scenes. We create a new dataset for the FTC and assess the performance of several contemporary approaches including implicit reasoning, symbolic reasoning, and implicit-symbolic reasoning with various models. Extensive experiments show that even state-ofthe-art models like GPT-4V can only archive with 48% accuracy on the FTC, for which, the FTC is a new challenge for the visual reasoning research community. Our dataset is available online.

(Thesis)

Fine-tuning with Randomly Initialized Downstream Network: Finding a Stable Convex-loss Region in Parameter Space
Yufeng Zhao
Master’s Thesis - Rank A @ Beijing Institute of Technology. 2023. 81 pages.
Synthesis and Self-Assembly of Aggregation-induced Emission Compounds
Yufeng Zhao
Bachelor Thesis @ Beijing Institute of Technology. 2021. 52 pages.

Resume

Ph.D. in Computer Science, Research Assistant, 2023.10 ~ (2026.3), Fast-track Graduation
Graduate School of Information Science, Japan Advanced Institute of Science and Technology
Mentor: Assoc. Prof. Naoya Inoue
M.Eng. in Software Engeering, 2021.9 ~ 2023.6
Graduate School of Computer Science and Technology, Beijing Institute of Technology
Mentor: Yufeng Zhao (Self-motivated)
B.Eng. in Chemistry, 2017.8 ~ 2021.6
Department of Basic Science, School of Material Science and Engineering, Beijing Institute of Technology
Mentor: Assoc. Prof. Jianbing Shi

Awards

Outstanding Paper @ The 31st Annual Conference of the Japanese Association for Natural Language Processing (NLP2025, ANLP). 2025. (top 14 in 765, 2.0%)
Research Award for Young Scholars @ The 260th SIG for Natural Language, Information Processing Society of Japan (SIG-NL260, IPSJ). 2024.
SB Intuitions Awards @ The 30st Annual Conference of the Japanese Association for Natural Language Processing (NLP2024, ANLP). 2024.
Monbukagakusho Honors Scholarship @ Japanese Ministry of Education, Culture, Sports, Science and Technology. 2023.
Outstanding Oral Presentation @ 2022 Euro-Asia Conference on Frontiers of Computer Science and Information Technology. 2022.
GPA Improvement Award @ Beijing Institute of Technology. 2020. I missed (medical) many exams in 2019, so my regular GPA in 2020 were considered a significant improvement.
Annual Outstanding Academic (GPA) Scholarship @ Beijing Institute of Technology. 2018, 2019, 2021, 2022, 2023.

Hakaze Cho / Yufeng Zhao / 趙 羽風