Abstracts
词嵌入是自然语言处理的一项基础技术。其核心理念是根据大规模语料中词语和上下文的联系,使用神经网络等机器学习算法自动提取有限维度的语义特征,将每个词表示为一个低维稠密的数值向量(词向量),以用于后续分析。心理学研究中,词向量及其衍生的各种语义联系指标可用于探究人类的语义加工、认知判断、发散思维、社会偏见与刻板印象、社会与文化心理变迁等各类问题。未来,基于词嵌入技术的心理学研究需要区分心理的内隐和外显成分,深化拓展动态词向量和大型预训练语言模型(如GPT、BERT)的应用,并在时间和空间维度建立细粒度词向量数据库,更多开展基于词嵌入的社会变迁和跨文化研究。 |
[英文摘要]As a basic technique in natural language processing (NLP), word embedding represents a word with a low-dimensional, dense, and continuous numeric vector (i.e., word vector). Word embeddings can be obtained by using neural network algorithms to predict words from the surrounding words or vice versa (Word2Vec and FastText) or words’ probability of co-occurrence (GloVe) in large-scale text corpora. In this case, the values of dimensions of a word vector denote the pattern of how a word can be predicted in a context, substantially connoting its semantic information. Therefore, word embeddings can be utilized for semantic analyses of text. In recent years, word embeddings have been rapidly employed to study human psychology, including human semantic processing, cognitive judgment, individual divergent thinking (creativity), group-level social cognition, sociocultural changes, and so forth. We have developed the R package “PsychWordVec” to help researchers utilize and analyze word embeddings in a tidy approach. Future research using word embeddings should (1) distinguish between implicit and explicit components of social cognition, (2) train fine-grained word vectors in terms of time and region to facilitate cross-temporal and cross-cultural research, and (3) deepen and expand the application of contextualized word embeddings and large pre-trained language models such as GPT and BERT. |
Download
Comment
Hits:
3054
Downloads:
302
From:
包寒吴霜
Journal:心理科学进展
Recommended references:
包寒吴霜,王梓西,程曦,苏展,杨盈,张光耀,王博,蔡华俭.(2023).基于词嵌入技术的心理学研究:方法及应用.心理科学进展.doi:10.12074/202301.00197V1 (Click&Copy)
Version History
|
[V1]
|
2023-01-30 13:59:36
|
chinaXiv:202301.00197V1
|
Download
|