Using bioinformatics to investigate functional diversity: a case study of MHC diversity in koalas

Type: Journal article

Reference: Silver LW, McLennan EA, Beaman J, da Silva KB, Timms P, Hogg CJ, Belov K. Using bioinformatics to investigate functional diversity: a case study of MHC diversity in koalas. Immunogenetics. 2024 Dec;76(5-6):381-395. doi: 10.1007/s00251-024-01356-6

Abstract

Conservation genomics can greatly improve conservation outcomes of threatened populations, including those impacted by disease. Understanding diversity within immune gene families, including the major histocompatibility complex (MHC) and toll-like receptors (TLR), is important due to the role they play in disease resilience and susceptibility. With recent advancements in sequencing technologies and bioinformatic tools, the cost of generating high-quality sequence data has significantly decreased and made it possible to investigate diversity across entire gene families in large numbers of individuals compared to investigating only a few genes or a few populations previously. Here, we use the koala as a case study for investigating functional diversity across populations. We utilised previous target enrichment data and 438 whole genomes to firstly, determine the level of sequencing depth required to investigate MHC diversity and, secondly, determine the current level of diversity in MHC genes in koala populations. We determined for low complexity, conserved genes such as TLR genes 10 × sequencing depth is sufficient to reliably genotype more than 90% of variants, whereas for complex genes such as the MHC greater than 20 × and preferably 30 × sequencing depth is required. We used whole genome data to identify 270 biallelic SNPs across 24 MHC genes as well as copy number variation (CNV) within class I and class II genes and conduct supertype analysis. Overall, we have provided a bioinformatic workflow for investigating variation in a complex immune gene family from whole genome sequencing data and determined current levels of diversity within koala MHC genes.

post

The error in your way: a beginner’s guide to troubleshooting command error messages

by Adele Gonsalvez

As a bioinformatic newbie, there is a lot to wrap your head around – from understanding basic programming language to what commands you need to use. In my experience, one particular gem is when you are trying to run a command and you receive one in a series of often uninformative error messages. Troubleshooting will end up dominating your time when you are doing any kind of coding, and it can be incredibly frustrating. So, instead of swearing at your computer (although that can be therapeutic at times), here’s some handy tips I’ve picked up that can be more effective in addressing that pesky error message.

It may seem like a minor issue, but in my experience most command errors come from typos, and they can be tricky to spot. Step through your command or script to ensure there aren’t any spelling mistakes or extra spaces at the end of commands. Also ensure file paths are correct, and input files exist and are correctly named.

ChatGPT is an incredibly useful tool for troubleshooting both error messages and general command generation. Specifying the error code, ChatGPT can outline the various causes for that error message and suggests how to go about addressing the issue.

Leave it for a couple hours. The human version of “Did you try turning it off and on again?”. Like any form of editing, if you have been staring at the same bit of text for too long, it is easy to gloss over misspelt words or extra spaces. Revisiting it later can help you find issues that you previously overlooked.

Ask your co-workers to look over your command or script. It’s likely that some of them will be more experienced in bioinformatics and can shed some light on what’s going wrong. Even if none of your coworkers are familiar with coding, a fresh set of eyes can often spot little mistakes much better than your own. I once spent hours trying to solve an error in a script, which only took for my friend 30 seconds to solve (it was an extra space at the end of a command).

Adele Gonsalvez (2022 Honours Student) is investigating the expression and the antimicrobial activity of defensins from the platypus and short-beaked echidna

post

Making bioinformatics more accessible

by Dr. Kate Farquharson (Post-doc)

In the AWGG lab, we are generating genomic resources for diverse Australian vertebrates, including birds, marsupials, amphibians and reptiles. However, following bioinformatics instructions can sometimes feel a bit like this:

And for non-model organisms, it can feel like being asked to draw an owl when you don’t even know what one looks like (or worse, imagine being given a picture of a human as a reference point). So, how do we make bioinformatics more accessible to people getting started? We have been working hard to carefully document our in-house workflows and contribute to public how-to guides, such as the Genome Assembly with Galaxy guide.

Documenting your work not only helps others but can be a useful way to remember what you have done before! Good documentation can help you to train others, present your methods and ensure your analysis is reproducible. Some tips for documenting your work include:

  • Always keep track of the software and versions used
  • Try out an editor such as Visual Studio Code, which allows you to easily insert code and scripts and integrates well with Github
  • Don’t forget your science brain! It can be very easy to follow a tutorial from start to finish but have no idea what the end result means. A few sentences to justify your approach and explain how you interpret your results will help others use your guide correctly

Good documentation is just one step we are working on as part of the Threatened Species Initiative and ARC Centre of Excellence in Peptides and Protein Science to make genomics and bioinformatics more accessible to conservation end-users.

Author

Dr Kate Farquharson

Dr Kate Farquharson is a Postdoctoral Research Associate in Bioinformatics within the ARC Centre of Excellence for Innovations in Peptide & Protein Science. She applies bioinformatic approaches to the assembly and annotation of genomes and transcriptomes of Australian species to identify targets for peptide discovery. Kate completed her PhD in the AWGG lab in 2020, where she used statistical and molecular genetic approaches to investigate adaptation to captivity in conservation breeding programs. Kate specialises in synthesising, analysing and interpreting data, and in communicating results clearly to a range of audiences.