WWW2018-Neural Attentional Rating Regression with Review-level Explanations

113 阅读 0 评论 75 点赞

我是靠谱客的博主优秀眼睛，这篇文章主要介绍WWW2018-Neural Attentional Rating Regression with Review-level Explanations，现在分享给大家，希望可以做个参考。

摘要：在电商平台中，Review信息对于用户而言是十分有用的，然而，却很少有人对Review的有效性进行研究。本文提出了Neural Attentional Regression model with Review-level Explanations (NARRE) ，在进行评分预测的同时考虑Review的有效性，在Review层面对推荐结果进行解释。

Introduction

评分预测是推荐系统中常见的任务。大多数方法都是基于协同过滤的，例如PMF等，这类方法最大的问题在于无法对推荐提供可解释性。很多研究表明在推荐中提供可解释性，是非常有用的。缺乏可解释性的推荐不能真正让用户信服。在大多数电商网站中，允许用户评论产品并给予评分，用户评论信息中会通常包含关于产品有用的特征，例如quality, material, color 等。在本文中，评论的有效性usefulness of a review 即用户能否根据用户做出是否购买的决定。下图举例说明什么是有效评论和无效评论。

已有研究将用户评论信息整合到隐含因子模型中增强其性能[3, 25–27, 39, 46] 或是生成推荐的解释[11, 32, 44]。尽管取得了较好的成绩，但是却存在两个问题。第一，缺乏评论对被推荐项贡献的建模以及对其他用户有效性；第二，以往研究中的解释通常是对词或短语的抽取，可能会造成评论内容的扭曲。本文是第一个利用评论有效性提升推荐效果和可解释性的工作。

本文提出了Neural Attentional Regression model with Review-level Explanations (NARRE) 模型，模型利用attention机制给每个评论赋予权重。利用用户和被推荐项、还有评论作为多层神经网络的输入，提出了一个权重公式。同时，参考[46]的方法，用两个并列的神经网络学习用户和被推荐项之间的隐含因子特征。其中一个用于建模用户书写评论的倾向，另外一个建模被推荐项目在评论上的倾向。最后一层，我们利用隐含因子模型[21]并将其扩展为一个神经网络作为评分预测的输出。作者在四个真实数据集做了对比实验，发现所提出的方法表现优于当前最好的方法，例如PMF[29], NMF[24], SVD++[20], HFT[27]以及DeepCoNN[46]等。

Related Work

与本文较为相关的最近几篇研究工作也都是将神经网络与协同过滤模型相结合，例如He等[13]提出的Neural Collaborative Filtering (NCF) framework ，建模了用户和被推荐项之间的非线性关系。随后，Neural Factorization Machines(NFM)[12] 利用高阶以及非线性关系扩展了传统的Factorization Machines方法。Collaborative Deep Learning[41] 提出了层次化的贝叶斯模型，基于CNN对用户评论进行建模，并将其与协同过滤结合。DeepCoNN[46] 使用卷积神经网络处理评论，并且将用户和被推荐项目分别并行建模最后用FM结合起来并作评分预测。NRT[25] 将GRU与协同过滤相结合，同时预测评分和生成摘要来模拟用户的感受和反应。当然，上述工作都没有提到评论的可解释性这一问题。

Methodology

Latent Factor Model: 在隐含因子模型中，任何用户u对项目i的评分都可以用下述公式来建模

CNN Text Processor : 给定长度为T的输入文本，其中V1:T是它的emebdding 矩阵，那么第j个神经元将其特征抽取如下，其中*表示卷积操作，则是第j个神经元在滑动窗口为t时的结果，然后最终属于这个神经元的特征将通过最大池化操作获得，最大池化的意义是找到最重要的特征，即值最大的那个特征。最后CNN的输出就是由其m个神经元的输出拼接出来的。最后，输出O会输入到一个全连接层，从而得到最终结果

本文模型 NARRE Model：模型主要由两个并行的神经网络构成，即用于建模用户的Net_u和用户建模被推荐项目的Net_i。在这两个网络的上层，加了一个使得用户与被推荐项目的隐含因子相互交互用于预测最后评分结果的预测层。在训练阶段，输入包括用户、被推荐项以及评论文本，在测试阶段，则将用户和被推荐项最为输入。这里以Net_i为例，Net_u可以同样的方式理解。首先，每一个评论中的词通过embedding 转化为 Vi1, Vi2,...Vik的embedding矩阵，矩阵放入CNN中的池化层，得到的输出是Oi1, Oi2, ...Oij. 接下来，若想得到item的表示，则可以将这些输出平均起来，这也是很常见的一种操作。而本文提出的是基于attention机制的方法，主要目的是学习到这些输出特征的不同权重。
为了计算attention score，模型采用了两层的神经网络，输入是第i个item的第l条评论的特征向量 ( O il )以及评论作者的用户ID embedding。ID embedding 作用是建模用户评论的有效性，识别那些经常写无效评论的用户。 attention network
定义如下：
，其中
最后，通过softmax进行归一化，得到

将ail作为最终的权重进行加权平均

最后通过全连接层得到item的最终表示

同理可以用同样的方法对Net_u进行建模，然后通过点乘将两部分结合，，最终的评分计算公式为。

模型的训练过程与传统推荐任务相同，其损失函数为

至此，本文的模型NARRE介绍完毕。

小结

这篇文章方法上中规中矩，巧妙之处在于运用了Rate_Usefulness 的概念，同时，也巧妙地利用了数据中的评论有效性，这是非常有意思的地方。除去评论有效性的概念，这篇文章与许多情感分析评分预测的模型思路大体一致。

参考文献

REFERENCES
[1] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint
arXiv:1409.0473 (2014).
[2] David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation.
Journal of machine Learning research 3, Jan (2003), 993–1022.
[3] Rose Catherine and William Cohen. 2017. TransNets: Learning to Transform for
Recommendation. arXiv preprint arXiv:1704.02298 (2017).
[4] Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and TatSeng Chua. 2017. Attentive collaborative ltering: Multimedia recommendation
with item-and component-level attention. In SIGIR. 335–344.
[5] Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, and Tat-Seng
Chua. 2016. SCA-CNN: Spatial and Channel-wise Attention in Convolutional
Networks for Image Captioning. arXiv preprint arXiv:1611.05594 (2016).
[6] J Cohen. 1968. Weighted kappa: nominal scale agreement with provision for
scaled disagreement or partial credit. Psychological Bulletin 70, 4 (1968), 213.
[7] Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu,
and Pavel Kuksa. 2011. Natural language processing (almost) from scratch.
Journal of Machine Learning Research 12, Aug (2011), 2493–2537.
[8] Qiming Diao, Minghui Qiu, Chao-Yuan Wu, Alexander J Smola, Jing Jiang, and
Chong Wang. 2014. Jointly modeling aspects, ratings and sentiments for movie
recommendation (jmars). In SIGKDD. 193–202.
[9] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT
press.
[10] Ruining He and Julian McAuley. 2016. Ups and downs: Modeling the visual
evolution of fashion trends with one-class collaborative ltering. In WWW. 507–
517.
[11] Xiangnan He, Tao Chen, Min-Yen Kan, and Xiao Chen. 2015. Trirank: Reviewaware explainable recommendation by modeling aspects. In CIKM. 1661–1670.
[12] Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for Sparse
Predictive Analytics. (2017).
[13] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng
Chua. 2017. Neural collaborative ltering. In WWW. 173–182.
[14] G Hinton, N Srivastava, and K Swersky. 2012. RMSProp: Divide the gradient by
a running average of its recent magnitude. Neural networks for machine learning,
Coursera lecture 6e (2012).
[15] Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag
of tricks for ecient text classication. arXiv preprint arXiv:1607.01759 (2016).
[16] Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A Convolutional
Neural Network for Modelling Sentences. Eprint Arxiv 1 (2014).
[17] Soo Min Kim, Patrick Pantel, Tim Chklovski, and Marco Pennacchiotti. 2006.
Automatically assessing review helpfulness. In EMNLP. 423–430.
[18] Yoon Kim. 2014. Convolutional neural networks for sentence classication. arXiv
preprint arXiv:1408.5882 (2014).
[19] D Kinga and J Ba Adam. 2015. A method for stochastic optimization. In ICLR.
[20] Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted
collaborative ltering model. In SIGKDD. 426–434.
[21] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer 42, 8 (2009).
[22] J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement
for categorical data. biometrics (1977), 159–174.
[23] Quoc V. Le and Tomas Mikolov. 2014. Distributed Representations of Sentences
and Documents. 4 (2014), II–1188.
[24] Daniel D Lee and H Sebastian Seung. 2001. Algorithms for non-negative matrix
factorization. In Advances in neural information processing systems. 556–562.
[25] Piji Li, Zihao Wang, Zhaochun Ren, Lidong Bing, and Wai Lam. 2017. Neural
Rating Regression with Abstractive Tips Generation for Recommendation. (2017).
[26] Guang Ling, Michael R Lyu, and Irwin King. 2014. Ratings meet reviews, a
combined approach to recommend. In RecSys. 105–112.
[27] Julian McAuley and Jure Leskovec. 2013. Hidden factors and hidden topics:
understanding rating dimensions with review text. In RecSys. 165–172.
[28] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jerey Dean. 2013.
Distributed representations of words and phrases and their compositionality. In
NIPS. 3111–3119.
[29] Andriy Mnih and Ruslan R Salakhutdinov. 2008. Probabilistic matrix factorization.
In Advances in neural information processing systems. 1257–1264.
[30] Vinod Nair and Georey E. Hinton. 2010. Rectied linear units improve restricted
boltzmann machines. In ICML. 807–814.
[31] Jerey Pennington, Richard Socher, and Christopher Manning. 2014. Glove:
Global Vectors for Word Representation. In EMNLP. 1532–1543.
[32] Zhaochun Ren, Shangsong Liang, Piji Li, Shuaiqiang Wang, and Maarten de Rijke.
2017. Social collaborative viewpoint regression with explainable recommendations. In WSDM. 485–494.
[33] Steen Rendle. 2010. Factorization machines. In ICDM. 995–1000.
[34] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should i
trust you?: Explaining the predictions of any classier. In SIGKDD. 1135–1144.
[35] Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to recommender systems handbook. In Recommender systems handbook. Springer,
1–35.
[36] Alexander M Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention
model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685
(2015).
[37] Nitish Srivastava, Georey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan
Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from
overtting. Journal of machine learning research 15, 1 (2014), 1929–1958.
[38] Xiaoyuan Su and Taghi M Khoshgoftaar. 2009. A survey of collaborative ltering
techniques. Advances in articial intelligence 2009 (2009), 4.
[39] Yunzhi Tan, Min Zhang, Yiqun Liu, and Shaoping Ma. 2016. Rating-Boosted
Latent Topics: Understanding Users and Items with Ratings and Reviews.. In
IJCAI. 2640–2646.
[40] Jesse Vig, Shilad Sen, and John Riedl. 2009. Tagsplanations: explaining recommendations using tags. In IUI. 47–56.
[41] Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative deep learning
for recommender systems. In SIGKDD. 1235–1244.
[42] Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua.
2017. Attentional factorization machines: Learning the weight of feature interactions via attention networks. arXiv preprint arXiv:1708.04617 (2017).
[43] Chenyan Xiong, Jimie Callan, and Tie-Yen Liu. 2017. Learning to attend and to
rank with word-entity duets. In SIGIR.
[44] Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and Shaoping
Ma. 2014. Explicit factor models for explainable recommendation based on
phrase-level sentiment analysis. In SIGIR. 83–92.
[45] Yongfeng Zhang, Yunzhi Tan, Min Zhang, Yiqun Liu, Tat-Seng Chua, and Shaoping Ma. 2015. Catch the Black Sheep: Unied Framework for Shilling Attack
Detection Based on Fraudulent Action Propagation.. In IJCAI. 2408–2414.
[46] Lei Zheng, Vahid Noroozi, and Philip S Yu. 2017. Joint deep modeling of users
and items using reviews for recommendation. In WSDM. 425–434.