Tianze Shi Home Publications

Learning Cross-lingual Word Embeddings via Matrix Co-factorization

Tianze Shi, Zhiyuan Liu, Yang Liu and Maosong Sun

In ACL (2015)

Abstract
A joint-space model for cross-lingual distributed representations generalizes language-invariant semantic features. In this paper, we present a matrix cofactorization framework for learning cross-lingual word embeddings. We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matrices. The cross-lingual constraints can be derived from parallel corpora, with or without word alignments. Empirical results on a task of cross-lingual document classification show that our method is effective to encode cross-lingual knowledge as constraints for cross-lingual word embeddings.

[pdf]

Bibtex

@InProceedings{shi-EtAl:2015:ACL-IJCNLP1,
    author    = {Shi, Tianze  and  Liu, Zhiyuan  and  Liu, Yang  and  Sun, Maosong},
    title     = {Learning Cross-lingual Word Embeddings via Matrix Co-factorization},
    booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
    month     = {July},
    year      = {2015},
    address   = {Beijing, China},
    publisher = {Association for Computational Linguistics},
    pages     = {567--572},
    url       = {http://www.aclweb.org/anthology/P15-2093}
}

Tianze Shi @ Cornell University. Built with jekyll