Hakaze Cho

@Beijing Inst. Tech. 2023

Ph.D. 2nd Year Student @ Graduate School of Information Science, Japan Advanced Institute of Science and Technology
Fully-funded Research Assistant & Mentor @ RebelsNLU, PI: A. Prof. Naoya Inoue

Alias: Yufeng Zhao, both from the hieroglyph “趙 羽風”
Birth: Beijing, 1999

E-mail: yfzhao [at] jaist.ac.jp
Phone: +81-070-8591-1495
Links: Twitter     GitHub     Google Scholar     ORCID     Researchmap     Semantic Scholar     Blog    
Physical Address: Laboratory I-52, Information Science Building I, 1-1 Asahidai, Nomi, Ishikawa, Japan

I graduated from Beijing Institute of Technology, a top-ranking university in China, with a Master’s degree in Software Engineering in 2023 and a Bachelor’s degree in Chemistry in 2021. I am pursuing a Ph.D. at JAIST, with an expected early graduation in March 2026. My research focuses on exploring the internal mechanisms of artificial neural networks, particularly Transformer-based neural language models, during both training and inference by mathematical and representation learning methods, and improving their performance robustly through this deeper understanding. I have published over 20 papers / presentations in this area since 2023, some of which have been presented at top-tier international conferences such as ICLR and NAACL.

I am actively seeking productive research collaborations in the mentioned area. If you are interested in working together, please do not hesitate to contact me. I welcome collaborations with both experts and motivated beginners—being a novice is not a drawback if you are eager and efficient to learn. Additionally, I am open to exploring collaborations in other areas as well.

Japanese Site (日本語版)

Research Interests

Keywords: Representation Learning, Mechanistic Interpretability, In-context Learning

  • Interpretability for Artificial Neural Network: Mechanistic Interpretability, Low-resource Model Controlling
  • Large Languages Models: Mechanism of / Improving Transformer Large Language Models
  • Misc.: Manifold Learning, Low-precision Neural Networks, Neural Network Training Dynamics

Publications

Total Publications: 27, Cumulative IF: 73.1, Total Pages: 434.

International Conference

  1. Revisiting In-context Learning Inference Circuit in Large Language Models
    Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue
    International Conference on Learning Representations (ICLR). 2025. 37 pages. [h5=304, IF=48.9]
    [OpenReview] [PDF] [arXiv] [Github] [Poster] [Abstract] [Bibtex
    In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Input Text Encode: LMs encode every input text (in the demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint representations of demonstrations similar to the query representation on a task subspace, and copy the searched representations into the query. Then, language model heads capture these copied label representations to a certain extent and decode them into predicted labels. Through careful measurements, the proposed inference circuit successfully captures and unifies many fragmented phenomena observed during the ICL process, making it a comprehensive and practical explanation of the ICL inference process. Moreover, ablation analysis by disabling the proposed steps seriously damages the ICL performance, suggesting the proposed inference circuit is a dominating mechanism. Additionally, we confirm and list some bypass mechanisms that solve ICL tasks in parallel with the proposed circuit.
    @inproceedings{cho2025revisiting,
        title={Revisiting In-context Learning Inference Circuit in Large Language Models},
        author={Hakaze Cho and Mariko Kato and Yoshihiro Sakai and Naoya Inoue},
        booktitle={The Thirteenth International Conference on Learning Representations},
        year={2025},
        url={https://openreview.net/forum?id=xizpnYNvQq}
    }
  2. Token-based Decision Criteria Are Suboptimal in In-context Learning
    Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue
    Annual Conference of the Nations of the Americas Chapter of the ACL (NAACL.main). 2025. 24 pages. [h5=132, IF=16.5]
    [ACL Anthology] [PDF] [arXiv] [Github] [Poster] [Abstract] [Bibtex
    In-Context Learning (ICL) typically utilizes classification criteria from output probabilities of manually selected label tokens. However, we argue that such token-based classification criteria lead to suboptimal decision boundaries, despite delicate calibrations through translation and constrained rotation applied. To address this problem, we propose Hidden Calibration, which renounces token probabilities and uses the nearest centroid classifier on the LM’s last hidden states. In detail, we assign the label of the nearest centroid previously estimated from a calibration set to the test sample as the predicted label. Our experiments on 6 models and 10 classification datasets indicate that Hidden Calibration consistently outperforms current token-based baselines by about 20%~50%, achieving a strong state-of-the-art in ICL. Our further analysis demonstrates that Hidden Calibration finds better classification criteria with less inter-class overlap, and LMs provide linearly separable intra-class clusters with the help of demonstrations, which supports Hidden Calibration and gives new insights into the principle of ICL. Our official code implementation can be found at https://github.com/hc495/Hidden_Calibration.
    @inproceedings{cho2025token,
        title={Token-based Decision Criteria Are Suboptimal in In-context Learning},
        author={Hakaze Cho and Yoshihiro Sakai and Mariko Kato and Kenshiro Tanaka and Akira Ishii and Naoya Inoue},
        booktitle={Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)},
        year={2025},
        url={https://aclanthology.org/2025.naacl-long.278/}
    }
  3. Understanding Token Probability Encoding in Output Embeddings
    Hakaze Cho, Yoshihiro Sakai, Kenshiro Tanaka, Mariko Kato, Naoya Inoue
    International Conference on Computational Linguistics (COLING). 2025. 16 pages. [h5=65, IF=7.7]
    [ACL Anthology] [PDF] [arXiv] [Poster] [Abstract] [Bibtex
    In this paper, we investigate the output token probability information in the output embedding of language models. We find an approximate common log-linear encoding of output token probabilities within the output embedding vectors and empirically demonstrate that it is accurate and sparse. As a causality examination, we steer the encoding in output embedding to modify the output probability distribution accurately. Moreover, the sparsity we find in output probability encoding suggests that a large number of dimensions in the output embedding do not contribute to causal language modeling. Therefore, we attempt to delete the output-unrelated dimensions and find more than 30% of the dimensions can be deleted without significant movement in output distribution and sequence generation. Additionally, in the pre-training dynamics of language models, we find that the output embeddings capture the corpus token frequency information in early steps, even before an obvious convergence of parameters starts.
    @inproceedings{cho2025understanding,
        title={Understanding Token Probability Encoding in Output Embeddings},
        author={Hakaze Cho and Yoshihiro Sakai and Kenshiro Tanaka and Mariko Kato and Naoya Inoue},
        booktitle={Proceedings of the 31st International Conference on Computational Linguistics},
        year={2025},
        url={https://aclanthology.org/2025.coling-main.708/}
    }
  4. Find-the-Common: A Benchmark for Explaining Visual Patterns from Images
    Yuting Shi, Naoya Inoue, Houjing Wei, Yufeng Zhao, Tao Jin
    International Conference on Language Resources and Evaluation (LREC). 2024. 7 pages. [h5=59]
    [ACL Anthology] [PDF] [Abstract] [Bibtex
    Recent advances in Instruction-fine-tuned Vision and Language Models (IVLMs), such as GPT-4V and InstructBLIP, have prompted some studies have started an in-depth analysis of the reasoning capabilities of IVLMs. However, Inductive Visual Reasoning, a vital skill for text-image understanding, remains underexplored due to the absence of benchmarks. In this paper, we introduce Find-the-Common (FTC): a new vision and language task for Inductive Visual Reasoning. In this task, models are required to identify an answer that explains the common attributes across visual scenes. We create a new dataset for the FTC and assess the performance of several contemporary approaches including Image-Based Reasoning, Text-Based Reasoning, and Image-Text-Based Reasoning with various models. Extensive experiments show that even state-of-the-art models like GPT-4V can only archive with 48% accuracy on the FTC, for which, the FTC is a new challenge for the visual reasoning research community. Our dataset has been released and is available online: https://github.com/SSSSSeki/Find-the-common.
    @inproceedings{shi2024find,
        title={Find-the-Common: A Benchmark for Explaining Visual Patterns from Images},
        author={Yuting Shi and Naoya Inoue and Houjing Wei and Yufeng Zhao and Tao Jin},
        booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
        year={2024},
        url={https://aclanthology.org/2024.lrec-main.642/}
    }
  5. Methods to Enhance BERT in Aspect-Based Sentiment Classification
    Yufeng Zhao, Evelyn Soerjodjojo, et al.
    IEEE Euro-Asia Conference on Frontiers of Computer Science and Information Technology2022. 7 pages. Outstanding Oral Presentation Award.
    [PDF] [Abstract] [Bibtex
    BERT is a widely used pre-trained model in Natural Language Processing tasks, including Aspect-Based sentiment classification. BERT is equipped with sufficient prior language knowledge in the enormous amount of pre-trained model parameters, for which the fine-tuning of BERT has become a critical issue. Previous works mainly focused on specialized downstream networks or additional knowledge to fine-tune the BERT to the sentiment classification tasks. In this paper, we design experiments to find the fine-tuning techniques that can be used by all models with BERT in the Aspect-Based Sentiment Classification tasks. Through these experiments, we verify different feature extraction, regularization, and continual learning methods, then we summarize 8 universally applicable conclusions to enhance the training and performance of the BERT model.
    @inproceedings{zhao2022methods,
        title={Methods to enhance bert in aspect-based sentiment classification},
        author={Zhao, Yufeng and Soerjodjojo, Evelyn and Che, Haiying},
        booktitle={2022 Euro-Asia Conference on Frontiers of Computer Science and Information Technology (FCSIT)},
        pages={21--27},
        year={2022},
        organization={IEEE}
    }

Pre-print

  1. Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning
    Haolin Yang, Hakaze Cho, Yiqiao Zhong, Naoya Inoue
    Pre-print. 2025. 45 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    The unusual properties of in-context learning (ICL) have prompted investigations into the internal mechanisms of large language models. Prior work typically focuses on either special attention heads or task vectors at specific layers, but lacks a unified framework linking these components to the evolution of hidden states across layers that ultimately produce the model's output. In this paper, we propose such a framework for ICL in classification tasks by analyzing two geometric factors that govern performance: the separability and alignment of query hidden states. A fine-grained analysis of layer-wise dynamics reveals a striking two-stage mechanism: separability emerges in early layers, while alignment develops in later layers. Ablation studies further show that Previous Token Heads drive separability, while Induction Heads and task vectors enhance alignment. Our findings thus bridge the gap between attention heads and task vectors, offering a unified account of ICL's underlying mechanisms.
    @article{yang2025unifying,
        title={Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning},
        author={Yang, Haolin and Cho, Hakaze and Zhong, Yiqiao and Inoue, Naoya},
        journal={arXiv preprint arXiv:2505.18752},
        year={2025}
    }
  2. Mechanistic Fine-tuning for In-context Learning
    Hakaze Cho, Peng Luo, Mariko Kato, Rin Kaenbyou, Naoya Inoue
    Pre-print. 2025. 28 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    In-context Learning (ICL) utilizes structured demonstration-query inputs to induce few-shot learning on Language Models (LMs), which are not originally pre-trained on ICL-style data. To bridge the gap between ICL and pre-training, some approaches fine-tune LMs on large ICL-style datasets by an end-to-end paradigm with massive computational costs. To reduce such costs, in this paper, we propose Attention Behavior Fine-Tuning (ABFT), utilizing the previous findings on the inner mechanism of ICL, building training objectives on the attention scores instead of the final outputs, to force the attention scores to focus on the correct label tokens presented in the context and mitigate attention scores from the wrong label tokens. Our experiments on 9 modern LMs and 8 datasets empirically find that ABFT outperforms in performance, robustness, unbiasedness, and efficiency, with only around 0.01% data cost compared to the previous methods. Moreover, our subsequent analysis finds that the end-to-end training objective contains the ABFT objective, suggesting the implicit bias of ICL-style data to the emergence of induction heads. Our work demonstrates the possibility of controlling specific module sequences within LMs to improve their behavior, opening up the future application of mechanistic interpretability.
    @article{cho2025mechanistic,
        title={Mechanistic Fine-tuning for In-context Learning},
        author={Cho, Hakaze and Luo, Peng and Kato, Mariko and Kaenbyou, Rin and Inoue, Naoya},
        journal={arXiv preprint arXiv:2505.14233},
        year={2025}
    }
  3. Measuring Intrinsic Dimension of Token Embeddings
    Takuya Kataiwa, Hakaze ChoTetsushi Ohki
    Pre-print. 2025. 6 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    In this study, we measure the Intrinsic Dimension (ID) of token embedding to estimate the intrinsic dimensions of the manifolds spanned by the representations, so as to evaluate their redundancy quantitatively compared to their extrinsic dimensionality. In detail, (1) we estimate the ID of token embeddings in small-scale language models and also modern large language models, finding that the embedding spaces often reside on lower-dimensional manifolds compared to their extrinsic dimensionality; (2) we measure the ID across various model sizes and observe an increase in redundancy rates as the model scale grows; (3) we measure the dynamics of IDs during the training process, and find a rapid ID drop in the early stages of training. Moreover, (4) when LoRA is applied to the embedding layers, we observe a sudden drop in perplexity around the estimated IDs, suggesting that the ID can serve as a useful guideline for LoRA application.
    @article{kataiwa2025measuring,
        title={Measuring Intrinsic Dimension of Token Embeddings},
        author={Kataiwa, Takuya and Cho, Hakaze and Ohki, Tetsushi},
        journal={arXiv preprint arXiv:2503.02142},
        year={2025}
    }
  4. Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations
    Mariko Kato, Hakaze Cho, Yoshihiro Sakai, Naoya Inoue
    Pre-print. 2025. 8 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    The performance of In-Context Learning (ICL) is highly sensitive to the selected demonstrations. Existing approaches to demonstration selection optimize different objectives, yielding inconsistent results. To address this, we propose a unified metric--affinity and diversity--that leverages ICL model's internal representations. Our experiments show that both affinity and diversity strongly correlate with test accuracies, indicating their effectiveness for demonstration selection. Moreover, we show that our proposed metrics align well with various previous works to unify the inconsistency.
    @article{kato2025affinity,
        title={Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations},
        author={Kato, Mariko and Cho, Hakaze and Sakai, Yoshihiro and Inoue, Naoya},
        journal={arXiv preprint arXiv:2502.14380},
        year={2025}
    }
  5. StaICC: Standardized Evaluation for Classification Task in In-context Learning
    Hakaze ChoNaoya Inoue
    Pre-print. 2025. 20 pages. 
    [PDF] [arXiv] [Github] [PyPI] [Abstract] [Bibtex
    Classification tasks are widely investigated in the In-Context Learning (ICL) paradigm. However, current efforts are evaluated on disjoint benchmarks and settings, while their performances are significantly influenced by some trivial variables, such as prompt templates, data sampling, instructions, etc., which leads to significant inconsistencies in the results reported across various literature, preventing fair comparison or meta-analysis across different papers. Therefore, this paper proposes a standardized and easy-to-use evaluation toolkit (StaICC) for in-context classification. Including, for the normal classification task, we provide StaICC-Normal, selecting 10 widely used datasets, and generating prompts with a fixed form, to mitigate the variance among the experiment implementations. To enrich the usage of our benchmark, we also provide a sub-benchmark StaICC-Diag for diagnosing ICL from several aspects, aiming for a more robust inference processing.
    @article{cho2025staicc,
        title={StaICC: Standardized Evaluation for Classification Task in In-context Learning},
        author={Cho, Hakaze and Inoue, Naoya},
        journal={arXiv preprint arXiv:2501.15708},
        year={2025}
    }
  6. NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning
    Yufeng Zhao, Yoshihiro Sakai, Naoya Inoue
    Pre-print. 2024. 20 pages. 
    [PDF] [arXiv] [Github] [Abstract] [Bibtex
    In-Context Learning (ICL) is suffering from unsatisfactory performance and under-calibration due to high prior bias and unfaithful confidence. Some previous works fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for better performance and calibration. Our experiments on two models and 12 downstream datasets show that NoisyICL can help ICL produce more accurate predictions. Our further analysis indicates that NoisyICL enables the model to provide more fair predictions, and also with more faithful confidence. Therefore, we believe that NoisyICL is an effective calibration of ICL. Our experimental code is uploaded to Github.
    @article{zhao2024noisyicl,
        title={NoisyICL: A Little Noise in Model Parameters Calibrates In-context Learning},
        author={Zhao, Yufeng and Sakai, Yoshihiro and Inoue, Naoya},
        journal={arXiv preprint arXiv:2402.05515},
        year={2024}
    }
  7. SkIn: Skimming-Intensive Long-Text Classification Using BERT for Medical Corpus
    Yufeng Zhao, et al.
    Pre-print. 2022. 14 pages. 
    [PDF] [arXiv] [Abstract] [Bibtex
    BERT is a widely used pre-trained model in natural language processing. However, since BERT is quadratic to the text length, the BERT model is difficult to be used directly on the long-text corpus. In some fields, the collected text data may be quite long, such as in the health care field. Therefore, to apply the pre-trained language knowledge of BERT to long text, in this paper, imitating the skimming-intensive reading method used by humans when reading a long paragraph, the Skimming-Intensive Model (SkIn) is proposed. It can dynamically select the critical information in the text so that the sentence input into the BERT-Base model is significantly shortened, which can effectively save the cost of the classification algorithm. Experiments show that the SkIn method has achieved superior accuracy than the baselines on long-text classification datasets in the medical field, while its time and space requirements increase linearly with the text length, alleviating the time and space overflow problem of basic BERT on long-text data.
    @article{zhao2022skin,
        title={SkIn: Skimming-Intensive Long-Text Classification Using BERT for Medical Corpus},
        author={Zhao, Yufeng and et al.},
        journal={arXiv preprint arXiv:2209.05741},
        year={2022}
    }

Domestic Conferences / Journal / Miscellaneous
(† = Japan-domestic Secondary Publication for International Conference Papers; Default: Non-refereed, ▲= Refereed)

  1. ▲†Measuring Intrinsic Dimension of Token Embeddings
    Takuya Kataiwa, Hakaze ChoTetsushi Ohki
    Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages. 
    [PDF] [Abstract
    本稿では,言語の埋め込み表現である単語ベクトルや埋め込み層について,表現に必要十分な次元である内在次元 (Intrinsic Dimension; ID) を計測し,その冗長度合いを定量評価する.具体的には,(1) Word2Vec や GloVe などの小規模モデルの埋め込みが持つIDを推定し,(2) Pythiaシリーズを代表とする大規模言語モデルの埋め込み層における ID をスケール別・学習過程別に解析する.実験の結果,埋め込み空間が外在次元に比べ低い次元の多様体上に分布する傾向が見られた.また,モデル規模の拡大に伴う冗長率の変化や,学習初期において ID が急速に収束する傾向が観察された.また,推定されたIDがLoRA適用時のランク選択に有効な可能性を示した.
  2. Analysis of Internal Representations of Knowledge with Expressions of Familiarity
    Kenshiro Tanaka, Yoshihiro Sakai, Hakaze ChoNaoya Inoue, Kai Sato, Ryosuke Takahashi, Benjamin HeinzerlingKentaro Inui
    Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages. 
    [PDF] [Abstract
    大規模言語モデル (LLM) の知識の既知性判断能力に関する研究が進められつつあるが、「It is known that…」のような既知性を示す言語表現を伴う知識を学習した際に、推論時にLLMがその知識の既知性を判断する能力については、検討されていない。本研究では、事前学習済みLLMに既知性を示す言語表現を付与した知識の記述を学習させ、その知識の内部表象を分析することで、既知性がどのようにLLMの内部に表現され得るのかを分析する。その結果、(1)知識の内部表象には、学習時に付与した言語表現毎に個別に既知性の情報が保持されていること、(2)既知性の情報は言語表現の記述位置毎に個別に保持されることが明らかになった。本研究は、LLMの既知性の判断能力のメカニズム解明の足がかりとなるものである。
  3. Internal Representations of Knowledge Recognition in Language Models
    Kai Sato, Ryosuke Takahashi, Benjamin Heinzerling, Kenshiro Tanaka, Hakaze Cho, Yoshihiro Sakai, Naoya InoueKentaro Inui
    Annual Conference of the Japanese Society for Artificial Intelligence (JSAI). 2025. 4 pages. 
    [PDF] [Abstract
    言語モデル(LM)の知識獲得能力は広く研究されているが,獲得した知識の既知性に関する判断機序については十分な理解が得られていない.本研究ではLMを用いて,特定の知識に対する出力生成時と既知性判断時の内部状態を比較分析した.結果として,言語モデルが実際に既知性判断を行う能力を持ち得ることが示され,(1)知識を学習した時点で,既知性を判断するための情報が内部表現中に存在すること,(2)既知と判断される知識と未知と判断される知識において,LMがそれぞれ異なる活性化パターンを示すことを明らかにした.これらの知見は,LMの既知性判断メカニズムの理解へ向けた手がかりを提供する.
  4. Revisiting In-context Learning Inference Circuit in Large Language Models
    Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 6 pages. Oral, Outstanding Paper.
    [PDF] [Slides] [Abstract
    In-context Learning (ICL) は,言語モデルにおける新たな少数ショット学習パラダイムとして注目されているが,その内在的メカニズムは十分に解明されていない. 本研究では,ICL の推論ダイナミクスを3 つの基本操作に分解し,それらを基盤として推論回路を構築した上で精密な測定を行い,従来の研究で観察されてきた現象を統一的に説明することを試みた. さらに,提案した回路を無効化するアブレーション分析の結果,ICL の性能が顕著に低下することが確認され,提案した推論回路が ICL の主要なメカニズムであることが示唆された.
  5. Beyond the Induction Circuit: A Mechanistic Prototype for Out-of-domain In-context Learning
    Hakaze ChoNaoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 5 pages. 
    [PDF] [Poster] [Abstract
    In-contextLearning (ICL) is a promising few-shot learning paradigm with unclear mechanisms. Existing explanations heavily rely on Induction Heads, which fail to account for out-of-domain ICL, where query labels are absent in demonstrations. To address this, we model ICL as attribute resolution, where queries are mixtures of some attributes, and ICL identifies and resolves relevant attributes for predictions. In this paper, we propose a mechanistic prototype using toy models trained on synthetic data, and observe: (1) even 1-layer Transformers achieve non-trivial accuracy, with limited benefit from additional demonstrations, (2) scaling models effectively improve accuracy, and (3) inference operations can be decomposed into label space identification and generalized induction, warranting further exploration.
  6. Measuring Intrinsic Dimension of Token Embeddings
    Takuya Kataiwa, Hakaze ChoTetsushi Ohki
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 5 pages. 
    [PDF] [Abstract
    本研究では,言語の埋め込み表現である単語ベクトルや埋め込み層について,表現に必要十分な次元である内在次元 (Intrinsic Dimension; ID) を計測し,その冗長度合いを定量評価する.具体的には,(1)Word2Vec や GloVe などの小規模モデルの埋め込みが持つ ID を推定し,(2) Pythia 系列を代表とする大規模言語モデルの埋め込み層における ID をスケール別・学習過程別に解析する.実験の結果,埋め込み空間が外在的な次元に比べ低い次元の多様体上に分布する傾向が見られた.また,モデル規模の拡大に伴う冗長率の変化や,学習初期における急激な IDの形成が見られた.
  7. Affinity and Diversity: A Unified Metric for Demonstration Selection via Internal Representations
    Mariko Kato, Hakaze Cho, Yoshihiro Sakai, Naoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2025. 6 pages. 
    [PDF] [Abstract
    文脈内学習 (In-Context Learning; ICL) において, デモンストレーション (デモ) の選択はタスク性能に大きな影響を与える. 既存研究ではデモの選択手順については研究されているが, 選択基準であるデモの性質は十分に調べられていない. 本研究では, デモの「親和性」と「多様性」という 2 つの性質を新たに提案し, その内の親和性が性質が複数のモデルおよびデータセットにおいてデモ選択に望ましい性質であることを示した. さらに, 既存手法で選ばれたデモが, 2 つの性質のタスク性能を向上させる方向へ集約していることを示し, デモ選択とタスク性能のメカニズム解明への示唆を得た.
  8. StaICC: Standardized Evaluation for Classification Task in In-context Learning
    Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Naoya Inoue
    Symposium of Young Researcher Association for NLP Studies (YANS). 2025. Poster Only.
    [Poster
  9. Image Feature Vectors are Frozen Informative Tokens for Language Models
    Mariko Kato, Hakaze Cho, Zhenzhu Yan, Yuting Shi, Naoya Inoue
    Symposium of Young Researcher Association for NLP Studies (YANS). 2025. Poster Only.
  10. Token-based Decision Criteria Are Suboptimal in In-context Learning
    Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, Naoya Inoue
    The 260th SIG for Natural Language, Information Processing Society of Japan (SIG-NL260, IPSJ). 2024. 17 pages. Oral, Research Award for Young Scholars.
    [PDF] [Slides] [Abstract
    文脈内学習 (In-Context Learning; ICL) のタスクでは通常,ラベル空間に含まれるラベルトークンの生成確率を比べることで推論結果を決定するが,そのラベルトークンの選択は人間により恣意的に行われる.いくつかの先行研究は,これらのラベルトークンの生成確率の較正が ICL の性能向上に寄与することを明らかにしたが,これらの手法には依然として,人間が最適ではないラベルトークンを選べてしまうという問題が残る.そこで,本研究ではまず (1) LLM の隠れ状態を分析することで,現行のトークンベースの較正手法では,隠れ状態が持つ有益な情報をうまく表現出来ないことを明らかにする.そして,(2) 人間によるラベルトークン選択の影響を低減し,隠れ状態に含まれる有益な情報を効果的に利用出来る新たな ICL の手法を提案する.実験の結果,我々の提案手法は 3 つのモデルと 10 個の分類データセットでの実験で,現在のトークンベースの較正手法を約 20% 上回る性能を発揮した.
  11. NoisyICL: A Little Noise in Model Parameters Can Calibrate In-context Learning
    Yufeng Zhao, Yoshihiro Sakai, Naoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages. Oral.
    [PDF] [Slides] [Abstract
    In-Context Learning (ICL), where language models learn tasks in a generative form from few-shot demonstrations without parameter update, is emerging while scaling up the language models. Nevertheless, the performance of ICL is still unsatisfactory. Some previous studies suggested that it is due to under-calibration and they fine-tuned language models for better ICL performance with enormous datasets and computing costs. In this paper, we propose NoisyICL, simply perturbing the model parameters by random noises to strive for a calibration. Our experiments on 2 models and 7 downstream task datasets show that NoisyICL helps perform ICL better. Our further analysis indicates that NoisyICL can enable the model to provide more fair predictions, with less unfaithful confidence. So, NoisyICL can be considered as an effective calibration.
  12. Can LLM Learn Prompt Format in In-context Learning?
    Yoshihiro Sakai, Hakaze ChoNaoya Inoue
    Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages. SB Intuitions Awards.
    [PDF] [Abstract
    In-Context Learning (文脈内学習;ICL) は,プロンプト中に与えられた少数のデモなどからパラメータを更新することなくタスクを学習する LLM の能力であるが,そのメカニズムは十分に明らかにされていない.先行研究の実験は,「タスクの入力の後にラベルを出力する」というフォーマットを LLM に示すことが特に重要である可能性を示唆する.そこで本研究では,LLM が与えられたデモから答え方のフォーマットを学習する様子を直接的に可視化した.結果として,(1) 確かに LLM はデモから答え方のフォーマットを学んでいること,(2) フォーマットの学習は意味の無いラベルについても可能であること,(3) 最悪のラベルが ICL の Macro-F1 を大きく向上させることを発見した.
  13. Find-the-Common: Benchmarking Inductive Reasoning Ability on Vision-Language Models
    Yuting Shi, Naoya Inoue, Houjing Wei, Yufeng Zhao, Tao Jin
    Annual Conference of the Association for Natural Language Processing (NLP). 2024. 6 pages. 
    [PDF] [Abstract
    Recent advances in Instruction-fine-tuned Vision and Language Models (IVLMs) have revolutionized the landscape of integrated vision and language understanding. However, Inductive Visual Reasoning—a vital skill for textimage understanding—remains underexplored due to the absence of benchmarks. So, in this paper, we introduce Find–the–Common (FTC): a new vision and language task for Inductive Visual Reasoning. In this task, models are required to identify an answer that explains the common attributes across visual scenes. We create a new dataset for the FTC and assess the performance of several contemporary approaches including implicit reasoning, symbolic reasoning, and implicit-symbolic reasoning with various models. Extensive experiments show that even state-ofthe-art models like GPT-4V can only archive with 48% accuracy on the FTC, for which, the FTC is a new challenge for the visual reasoning research community. Our dataset is available online.

(Thesis)

  1. Fine-tuning with Randomly Initialized Downstream Network: Finding a Stable Convex-loss Region in Parameter Space
    Yufeng Zhao
    Master’s Thesis - Rank A @ Beijing Institute of Technology. 2023. 81 pages.
  2. Synthesis and Self-Assembly of Aggregation-induced Emission Compounds
    Yufeng Zhao
    Bachelor Thesis @ Beijing Institute of Technology. 2021. 52 pages.

Resume

Awards

  • Outstanding Paper @ The 31st Annual Conference of the Japanese Association for Natural Language Processing (NLP2025, ANLP). 2025. (top 14 in 765, 2.0%)
  • Research Award for Young Scholars @ The 260th SIG for Natural Language, Information Processing Society of Japan (SIG-NL260, IPSJ). 2024.
  • SB Intuitions Awards @ The 30st Annual Conference of the Japanese Association for Natural Language Processing (NLP2024, ANLP). 2024.
  • Monbukagakusho Honors Scholarship @ Japanese Ministry of Education, Culture, Sports, Science and Technology. 2023.
  • Outstanding Oral Presentation @ 2022 Euro-Asia Conference on Frontiers of Computer Science and Information Technology. 2022.
  • Annual Outstanding Academic Scholarship @ Beijing Institute of Technology. 2018, 2019, 2021, 2022, 2023.

Copyright © 2025 Hakaze Cho / Yufeng Zhao. All rights reserved. Icon generated by StableDiffusion.
Updated on 2025-06-09 20:16:22 +0900.
Viewed.