DescriptionHit Song Science aims to predict a songs popularity based on song structure and externalfeatures. To help provide an efficient and accurate tool for Annual Top-100 Billboard SongClassification, we apply fine-tuned BERT transformer and a joint learning approach. Wealso explore different audio descriptors and lyrics based embedding for modeling hit songclassification task on an innovative western song dataset created by us. We address classimbalance and data scarcity issues associated with traditional datasets by employing sharedlayer architecture and penalizing loss. We highlight a comparison study of three distinctlydesigned neural network architectures. All models yield high overall accuracy with relatively low training cost.