Guo, Huifeng, et al. 2017 “DeepFM: a factorization-machine based neural network for CTR prediction.”

This paper propose new model ‘DeepFM’ which combines the power of Factorization Machine for recommendation and the power of deep learning for feature learning in a neural network architecture. Unlike FM which learns only low-order feature interactions, DeepFM is possible to learn both low- and high- order feature interactions. Thus it can learn more sophisticated feature interactions behind user behaviors in CTR for recommender systems. Also, compared to ‘Wide&Deep’ model DeepFM has a shared input to its ‘wide’ and ‘deep’ part, with no need of feature engineering besides raw features.


1. Problem Setup and Notation

The task of CTR prediction is to build a prediction model ˆy= CTR_model(x) to estimate the probability of a user clicking.

Suppose the data set for training consists of n instances (X,y) where :

y={1the user clicked item0otherwiseX=m-fields data record

Each instance is converted to (x,y) where :

x=[xfield1,xfield2,,xfieldm]1×dxfieldiR1×mi,mi=1mi=d

with xfieldi being the vector representation of the i-th field of X


2. DeepFM

DeepFM consists of 2 components, FM(learn low-order feature interaction) and deep(learn high-order feature interaction) that share the same input. It brings 2 benefits :

  • it learns both low- and high- order feature interactions from raw features
  • no need for expertise feature engineering of the input

Input

Because the raw featrue input vector for CTR prediction is usually highly sparse, super high-dimensional, categorical-continuous-ised, and grouped in fields(e.g. gender, location, age), an embedding layer is suggested to compress the input vector to a low-dimensional(k), dense real-value vector.

  • the output of the embedding layer :
a(0)=[e1,e2,,em] {ei= the embedding of i-th fieldR1×km= the number of fields


Combined prediction model

ˆy=sigmoid(yFM+yDNN)

where yFM is the output of FM component and yDNN is the output of deep component.


2.1 FM Component

See FM post here. The latent feature vectors Vi in FM now server as network weights which are learned and used to compress the input field vectors to the embedding vectors.

yFM=w,x+dj1=1dj2=j1+1Vi,Vjxj1xj2

where wRd ,ViRk


2.2 Deep Component

The deep component is a feed-forward neural network, which is used to learn high-order feature interactions.

  • a(0) is fed into the deep neural network, and the forward process is :
a(l+1)=σ(W(l)a(l)+b(l)) {l=layer depthσ=activation functiona(l)=output of the l-th layerW(l)=model weight of the l-th layerb(l)=bias of the l-th layer
  • a dense real-value feature vector is generated, which is finally fed into the sigmoid function for CTR prediction :
yDNN=σ(W|H|+1aH+b|H|+1)

where |H|=the number of hidden layers


3. Experiments

It compared DeepFM with Logistic Regression(LR), Factorization Machines(FM), FNN, PNN and Wide&Deep(LR&DNN, FM&DNN) using AUC and logloss as evaluation metric.

Result

  • Learning feature interactions improves the performance of CTR prediction model.
    ; LR performs worse than the other models
  • Learning high- and low-order feature interactions simultaneously and properly improves the performance. ; DeepFM outperforms the models that learn only low-order feature interactions(FM) or only high-order feature interactions(FNN, PNN)
  • Learning high- and low-order feature interactions simultaneously while sharing the same feature embedding improves the performance.
    ; DeepFM outperforms the models that learn high- and low-order feature interactions using separate feature embeddings(LR&DNN, FM&DNN).


4. Summary

  • DeepFM trains a deep component and an FM component jointly that share the same input.
  • no need any pre-training
  • learns both high- and low- feature interactions
  • no need for expertise feature enginnering of the input as required in Wide&Deep

Reference

[1]Guo, Huifeng, et al. “DeepFM: a factorization-machine based neural network for CTR prediction.” arXiv preprint arXiv:1703.04247 (2017).
[2]Cheng, Heng-Tze, et al. “Wide & deep learning for recommender systems.” Proceedings of the 1st workshop on deep learning for recommender systems. 2016.
[3]Rendle, Steffen. “Factorization machines.” 2010 IEEE International Conference on Data Mining. IEEE, 2010.