DescriptionThis thesis consists of two studies pertaining to the evolution and genomic signatures of viruses. Viruses are obligate intracellular parasites that have a great impact on human, animal and plant health. The first study involves the human infecting Influenza A H5N1 viruses. H5N1 is an avian virus which occasionally infects humans, with a 50-60% mortality rate. Human-to-human transmission is limited, and most H5N1 infections are transmitted to humans from birds. Under such a transmission scheme, there can be a possibility of a biased transmission of H5N1 strains from birds to humans. Such a biased transmission could arise due to higher efficiency of some avian strains in infecting humans, an enhanced ability of the human immune response to clear some of the human-infecting avian strains, etc. We developed a novel strategy to identify such signatures and analyzed publicly available H5N1 hemagglutinin sequences from China, Egypt, and Indonesia. In each geographic region, it was found that human infecting strains arose from a subset of the avian viral pool characterized by geography specific mutations. These mutations lie in functionally important regions of hemagglutinin proteins involved in viral attachment to cells, immune response etc. After correcting for this transmission bias, an absence of further widespread bias was observed. This research also showed that vaccine evasion mutant viruses are unlikely to infect humans, a finding with significant implications for rational vaccine design. As a separate project, we developed a new method to detect novel capsid sequences. It is expected that a large part of the virosphere still remains uncharacterized. Viruses show remarkably high levels of sequence diversity. Hence, sequence similarity based methods have limited success in detection of novel viral sequences in metagenomic studies. However, in contrast to high sequence diversity, the capsid proteins from diverse families of icosahedral viruses show a conserved eight stranded beta barrel known as the ``Jelly-roll'' fold. Motivated by this structural conservation, we sought to classify such capsid protein sequences using a machine learning approach on alignment free features. The nature of the alignment free features suitable for the problem are first discussed. Using these alignment free features, a high-accuracy Support Vector Machine (SVM-Caps) was developed for classifying jelly-roll capsid proteins against other proteins. The predictive power of this classifier was compared to that of BLAST, a popular tool based on sequence similarity. SVM-Caps was found to have comparable but lower power to detect capsid sequences of known viral families, but significantly higher power in detection of capsid sequences from novel families. As an application of this method, the viral metagenomic data from the French Lake Bourget study were analyzed and many potential novel capsid sequences were found.