DescriptionModern computer vision models mostly rely on massive human annotated datasets for supervised training. The models are typically learned from the supervision of static datasets in a passive learning manner. As the performance on classical computer vision tasks tends to saturate, novel visual tasks emerged and posed challenges to the traditional passive learning paradigm. We explored such new settings where huge dataset supervisions are scarce, and novellearning paradigms beyond passive training are proposed. We specifically focused on the following three visual learning scenarios, in which we showed active and interactive learning paradigms are better suited than traditional passive learning. First, we focused on histopathological image classification with a limited annotation budget. We proposed an active selection algorithm via constrained submodular function maximization. The proposed method encourages uncertainty reduction as well as selection diversity. We also show the greedy-like algorithm has near optimal theoretical guarantee and scalable to large scale unlabeled data. Second, we proposed a novel semantic amodal segmentation task in which occluded object segmentation masks are predicted. To address the challenge of inadequate hard examples, we proposed to actively generate hard synthetic examples for training. Experiment results demonstrate improved performance against baselines. We also show the amodal segmentation can be applied to spatial depth ordering. Third, we proposed an interactive learning approach to generate natural language dialogue between two conversation agents, in order to accomplish a visual ground task. Experiment results showed that the interactive learning significantly improved the supervised training baseline, and the performance gains most when multiple models are simultaneously updated through mutual interaction. The analysis on the generated conversations showed the thorough interactive training, two agents learned to evolve the communication towards a more efficient direction, and improved the task success rate.