Tianze Shi, Zhiyuan Liu, Yang Liu and Maosong Sun
In ACL (2015)
Abstract
A joint-space model for cross-lingual distributed representations generalizes language-invariant semantic features. In this paper, we present a matrix cofactorization framework for learning cross-lingual word embeddings. We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matrices. The cross-lingual constraints can be derived from parallel corpora, with or without word alignments. Empirical results on a task of cross-lingual document classification show that our method is effective to encode cross-lingual knowledge as constraints for cross-lingual word embeddings.
Bibtex
@InProceedings{shi-EtAl:2015:ACL-IJCNLP1,
author = {Shi, Tianze and Liu, Zhiyuan and Liu, Yang and Sun, Maosong},
title = {Learning Cross-lingual Word Embeddings via Matrix Co-factorization},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
month = {July},
year = {2015},
address = {Beijing, China},
publisher = {Association for Computational Linguistics},
pages = {567--572},
url = {http://www.aclweb.org/anthology/P15-2093}
}
Tianze Shi @ Cornell University. Built with jekyll