Introduction
User modeling is the process of creating useful mathematical and computational representations of users that encode relevant user characteristics (e.g., preferences, needs) useful for downstream applications like content recommendation, malicious behavior detection, and churn forecasting. An increasingly popular approach in user modeling involves leveraging user behavioral logs, such as click streams on web pages and interaction logs on apps. These logs inherently take the form of sequences, similar to sentences in natural languages (e.g., English, German, Chinese), with the caveat that the tokens correspond to units of user behavior rather than words or word-pieces. Given the recent prominence of foundational language models for downstream applications, exploring foundational behavioral models is an appealing direction with relatively untapped potential.
In our recent work, we offer a case study of how foundational behavioral models can play a role in various user-centric prediction tasks. Figure 1 illustrates the framework. We leverage language modeling techniques from natural language processing to learn from user behavioral logs, enabling the effective learning of general-purpose user representations. However, applying language modeling methods to use-cases beyond natural language requires careful adaptation. While the approach has proven effective for search engines and e-commerce platforms, less is understood about how and when it generalizes to social platforms like Snapchat, where users can interact with multiple product surfaces and there are multiple tasks of interest (besides the conventional next-item prediction). Our recent work General-Purpose User Modeling with Behavioral Logs: A Snapchat Case Study, to be presented at SIGIR 2024, aims to address these two research gaps with a case study using Snapchat data. In this work:
Modeling Choices
To guide our model design and evaluation, we define five criteria for our user model:
Criterion 1 concerns the fact that user logs typically contain noisy events unrelated to user actions (e.g., app notifications, error reports), which should be left out. In our study, we meticulously examine all log events and curate a shortlist of events that are purely behavioral and initiated by user actions. Additionally, we take care to use a large and randomly selected sample of Snapchat users and their behavioral sequences to ensure diverse representation of the user community.
Criterion 2, 3 and 4 concern the choice of model training objectives. Masked Behavior Prediction and User Contrastive Learning are two suitable ones. The former involves randomly masking parts of users' behavior sequences, compelling the model to predict the masked behaviors based on their context and thus allowing the model to learn user behavioral information (fulfilling Criterion 3). The latter uses a contrastive loss function to maximize the distance between representations of different users and minimize the distance between representations of the same user based on behavioral sequences from different time points. Hence, the model learns user-specific information that distinguishes one user from another (fulfilling Criterion 4). Since these two objectives are not tied to any downstream goal, Criterion 2 is also fulfilled.
Criterion 5 concerns model evaluation, emphasizing that the learned user representations should generalize to different downstream tasks. To this end, we introduce three distinct downstream tasks: Reported Account Prediction, Ad View Time Prediction, and Account Self-deletion Prediction. They concern predicting accounts/users who get reported by other users (e.g., for displaying malicious behaviors), who engage with an ad above a certain duration threshold, and who voluntarily delete their own accounts.
Results
Our Model
We utilize the Transformer architecture with two customized training objectives: Masked Behavior Prediction and User Contrastive Learning. Additionally, we apply Attention with Linear Biases (ALiBi).
Baseline Approaches
Selected Findings
Figure 2 shows the results of our model and baseline models on the three downstream tasks across different time gaps, demonstrating the usefulness of user representations at different levels of staleness. We highlight two observations. Firstly, our model consistently outperforms all baselines (except for time gap 1 and 5 in Reported Account Prediction). Second, our model can detect malicious accounts (reported by other users) and predict user account self-deletion with high AUC scores up to one week in advance. However, our model predicts ad view time less well (still better than the baselines and chance). This is expected, as ad view time also depends on other important factors like ad content and users' cognitive states.
Table 1 shows the impact of User Contrastive Learning (UCL) and ALiBi on our model's performance. Without UCL, our model improves slightly on Masked Behavior Prediction, but significantly underperforms other tasks. Also, without ALiBi, performance suffered across all evaluation tasks. These results further show that naively applying language models to user modeling should be avoided.
Conclusion
This work is a case study exploring the use of language modeling techniques for modeling user behavior on Snapchat. We show that naive application of language modeling techniques on behavioral tasks is unideal, and incorporating user distinguishability into the loss function helps task performance. Moreover, we show that we can use ALiBi to overcome inference challenges of long(er than training time) behavioral sequences. Our work only just scratches the surface of what’s possible with foundational behavior models, and we seek to continue exploring this area more deeply with exploration of diverse token definitions, more complex and feature-rich event sequences, and better strategies for self-supervision. If you’re interested in this line of work, come find us at SIGIR 2024!