DescriptionWith the advance of modern artificial intelligence especially deep learning techniques, researchers have made substantial progress in developing recommender systems to help people make personalized decisions. On the one hand, recommender systems can efficiently discover and capture user interests under massive information overload. On the other hand, recommender systems solve the problem of how to maximize user engagement and conversion rate in order to achieve the business goal of continuous growth. Though achieving great success, current recommender systems still need to design specific model architectures and training objectives for different recommendation tasks. Meanwhile, many diverse features such as visual features and knowledge graphs are not fully utilized and integrated into existing recommender systems. Recently, the rapid growth of large language models not only brings revolutions for NLP tasks but also contributes to radical changes in other domains such as vision, robotics, and reasoning. Trained on large-scale data, many representative large models such as BERT, GPT-3 and CLIP have demonstrated their emergent abilities. They can be considered foundation models for fast adaptation to a wide spectrum of downstream tasks. The advantages of foundation models can be summarized as three-fold: 1) foundation models can conduct multiple tasks with a basic shared model architecture and training loss; 2) foundation models exhibit broad inclusivity for multimodal information and can be regarded as a general-purpose interface; 3) foundation models possess the capability for zero-shot or few-shot generalization to unseen tasks. However, foundation models for personalized decision-making especially recommendation tasks remain underexplored. Motivated by the aforementioned issues, in this thesis, we first explore language modeling as the core medium to unify multiple recommendation tasks into a shared foundation model that accommodates diverse features and versatile application scenarios. By converting all personalized data and task formulations into natural language prompts, we treat all recommendation tasks as a conditional text generation problem, thus fulfilling “one data format, one model, one loss” for all recommendation tasks. Moreover, we explore unifying vision, language, and personalized information into a multimodal foundation model with the help of parameter-efficient tuning and multimodal personalized prompts. Providing reasons behind recommendation decisions is also crucial for personalized foundation models. To this end, we develop visually-enhanced and path-based language modeling approaches to facilitate explainable recommendation with richer contextual information. We evaluate the effectiveness of the proposed approaches on real-world benchmarks in terms of six popular recommendation tasks. As a result, we demonstrate that personalized foundation models shed light on a promising technical route across different decision-making scenarios in recommender systems.