David Arndt

David Arndt

Bioinformatician, Software Developer

Bioinformatician and software developer with experience in the areas metabolomics, metagenomics, cheminformatics, and protein structural biology. Contributed to the development of several popular databases and tools (HMDB, PHASTER, Heatmapper, CFM-ID, METAGENassist, and others).

Work Experience

The Metabolomics Innovation Centre/
Wishart Research Group

Edmonton
Bioinformatician, Research Assistant
May 2011–Apr 2014,
Aug 2014–Dec 2020,
Feb 2021–Present

Development work on several leading metabolomics databases:

  • Including The Human Metabolome Database, FooDB, The Toxic Exposome Database, The Small Molecule Pathway Database, and ContaminantDB.
  • Collected and curated data from third party sources and published literature, including chemical structures, chemical identifiers (SMILES, InChI, InChIKey), NMR and MS spectra, gene regulation data, health effects, compound descriptions, mechanisms of toxicity, and biochemical pathways. Used Ruby on Rails, MySQL, JChem.
  • Used Python, RDKit, and the SMARTS language to generate large sets of chemical structures.

Metagenomics and genomics projects:

  • Lead developer of PHASTER, a leading web server for prophage prediction in bacterial genomes and metagenomic contigs that processes ~60,000 submissions/mo. (1.7 million to date). Optimized high-performance computing cluster job processing for 3X throughput increase.
  • Lead developer of METAGENassist, a web app for metagenomic data analysis. Adapted R functions for univariate analysis, multivariate analysis (PCA, PLS-DA), heatmaps, clustering and supervised learning.
  • Analysis of metagenomic samples.

Cheminformatics projects:

  • Contributed to ClassyFire, a 1.5 TB database of chemical classifications. Used Docker to prepare portable version with microservices, supporting scalability for high-throughput automatic chemical classification on high-performance computing clusters.
  • CFM-ID (v4.0 release forthcoming).

Contributed to Heatmapper, a web app for creating gene expression and other heatmaps. Used R and Shiny.

>5 years as primary manager of IT infrastructure in a bioinformatics lab, for >70 servers and >110 web applications and internal services.

Regent College

Vancouver
TA, Introductory New Testament Greek
Jun–Aug 2009

Marked exams and quizzes, prepared exercises, and answered students' questions.

Wishart Research Group, University of Alberta

Edmonton
Programmer/Analyst - Bioinformatics
May 2006–May 2008

Worked on several protein structural biology projects, including:

  • Algorithm development and implementation for protein structure refinement program incorporating NMR chemical shifts (written in C).
  • Contributed to development of protein secondary and 3D structure prediction pipelines, using C, Perl, Java.

Wishart Research Group, University of Alberta

Edmonton
Programmer/Analyst - Bioinformatics
May–Aug 2004

Worked on interactive graphical cellular metabolism simulator (Java).

Education

BSc Honors in Molecular Genetics

University of Alberta
1997–2001
Including laboratory methods, research project.

BSc in Computing Science with Specialization in Bioinformatics

University of Alberta
2002–2005
Including courses in databases, AI, algorithms and bioinformatics.

MA in Theological Studies

Regent College
2008–2015
Historical research thesis.
Ancient languages: Hebrew, Greek.

Professional Skills

Top Skills

90%

Bioinformatics

>10 years
Experience preparing several metabolomics databases, using cheminformatics tools, analyzing metagenomic data and developing structural biology tools.
90%

Ruby on Rails

Advanced, 6 years
Contributed to development of numerous websites, including PHASTER, CFM-ID, and HMDB. Wrote how-to guides and led tutorials for new staff.
80%

R/Shiny

Experienced, 8 years
Includes adaptation of statistical and machine learning methods in development of METAGENassist, and using Shiny to develop Heatmapper.
85%

Docker

Experienced, 3 years
Containerized several applications, adopting microservices architecture. Prepared containerized application for running on computing cluster. Led tutorials for staff.
85%

Cloud Infrastructure

Experienced, 6 years
Primary Cloud/IT infrastructure manager at TMIC for >110 web applications and internal services. Experience with AWS, Google Cloud, OpenStack. System administration, network configuration, backup solution implementation.

Other Skills

Python SQL Perl/CGI C Singularity Vagrant HPC Clusters
Bash Git Java JSP HTML CSS French

Portfolio Highlights

Led or worked on teams developing the following projects
PHASTER

PHASTER

HPC Cluster
Rails/Perl/Bash
Heatmapper

Heatmapper

R/Shiny
Docker
METAGENassist

METAGENassist

R/Python/Perl
Java/JSP
The Toxic Exposome Database (TEDB)

TEDB

Ruby on Rails
 
CFM-ID

CFM-ID

Ruby on Rails
Docker
ClassyFire

ClassyFire

Ruby on Rails
Docker
The Human Metabolome Database (HMDB)

HMDB

Ruby on Rails
 
ContaminantDB

ContaminantDB

Rails/Python/RDKit
(in development)

Publications

Arndt, D., Marcu, A., Liang, Y. and Wishart, D.S. (2019) PHAST, PHASTER and PHASTEST: Tools for finding prophage in bacterial genomes. Brief. Bioinformatics, 20(4), 1560–1567. View Article | PubMed | DOI
Djoumbou-Feunang, Y., Pon, A., Karu, N., Zheng, J., Li, C., Arndt, D., Gautam, M., Allen, F. and Wishart, D.S. (2019) CFM-ID 3.0: Significantly Improved ESI-MS/MS Prediction and Compound Identification. Metabolites, 9(4), 72. View Article | PubMed | DOI | View Website
Wishart, D.S., Feunang, Y.D., Marcu, A., Guo, A.C., Liang, K., Vázquez-Fresno, R., Sajed, T., Johnson, D., Li, C., Karu, N., et al. (2018) HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res., 46(D1), D608–D617. View Article | PubMed | DOI | View Website
Hafsa, N.E., Berjanskii, M.V., Arndt, D. and Wishart, D.S. (2018) Rapid and reliable protein structure determination via chemical shift threading. J. Biomol. NMR, 70, 33–51. View Article | PubMed | DOI | View Website
Ramirez-Gaona, M., Marcu, A., Pon, A., Guo, A.C., Sajed, T., Wishart, N.A., Karu, N., Djoumbou Feunang, Y., Arndt, D. and Wishart, D.S. (2017) YMDB 2.0: a significantly expanded version of the yeast metabolome database. Nucleic Acids Res., 45(D1), D440–D445. View Article | PubMed | DOI | View Website
Arndt, D., Grant, J.R., Marcu, A., Sajed, T., Pon, A., Liang, Y. and Wishart, D.S. (2016) PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res., 44, W16–21. View Article | PubMed | DOI | View Website
Babicki, S., Arndt, D., Marcu, A., Liang, Y., Grant, J.R., Maciejewski, A. and Wishart, D.S. (2016) Heatmapper: web-enabled heat mapping for all. Nucleic Acids Res., 44(W1), W147–153. View Article | PubMed | DOI | View Website
Berjanskii, M., Arndt, D., Liang, Y. and Wishart, D.S. (2015) A robust algorithm for optimizing protein structures with NMR chemical shifts. J. Biomol. NMR, 63(3), 255–264. View Article | PubMed | DOI | View Website
Hafsa, N.E., Arndt, D. and Wishart, D.S. (2015) CSI 3.0: a web server for identifying secondary and super-secondary structure in proteins using NMR chemical shifts. Nucleic Acids Res., 43(W1), W370–377. View Article | PubMed | DOI | View Website
Hafsa, N.E., Arndt, D. and Wishart, D.S. (2015) Accessible surface area from NMR chemical shifts. J. Biomol. NMR, 62(3), 387–401. View Article | PubMed | DOI | View Website
Wishart, D., Arndt, D., Pon, A., Sajed, T., Guo, A.C., Djoumbou, Y., Knox, C., Wilson, M., Liang, Y., Grant, J., et al. (2015) T3DB: the toxic exposome database. Nucleic Acids Res., 43(D1), D928–934. View Article | PubMed | DOI | View Website
Law, V., Knox, C., Djoumbou, Y., Jewison, T., Guo, A.C., Liu, Y., Maciejewski, A., Arndt, D., Wilson, M., Neveu, V., et al. (2014) DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res., 42(D1), D1091–1097. View Article | PubMed | DOI | View Website
Jewison, T., Su, Y., Disfany, F.M., Liang, Y., Knox, C., Maciejewski, A., Poelzer, J., Huynh, J., Zhou, Y., Arndt, D., et al. (2014) SMPDB 2.0: big improvements to the Small Molecule Pathway Database. Nucleic Acids Res., 42(D1), D478–484. View Article | PubMed | DOI | View Website
Wishart, D.S., Jewison, T., Guo, A.C., Wilson, M., Knox, C., Liu, Y., Djoumbou, Y., Mandal, R., Aziat, F., Dong, E., Bouatra, S., Sinelnikov, I., Arndt, D., et al. (2013) HMDB 3.0—The Human Metabolome Database in 2013. Nucleic Acids Res., 41(D1), D801–807. View Article | PubMed | DOI | View Website
Wishart, D.S., Arndt, D., Berjanskii, M., Tang, P., Zhou, J. and Lin, G. (2008) CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data. Nucleic Acids Res., 36(Web Server issue), W496–502. View Article | PubMed | DOI | View Website
Montgomerie, S., Cruz, J.A., Shrivastava, S., Arndt, D., Berjanskii, M. and Wishart, D.S. (2008) PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation. Nucleic Acids Res., 36(Web Server issue), W202–209. View Article | PubMed | DOI | View Website
Arndt, D., Xia, J., Liu, Y., Zhou, Y., Guo, A.C., Cruz, J.A., Sinelnikov, I., Budwill, K., Nesbø, C.L. and Wishart, D.S. (2012) METAGENassist: a comprehensive web server for comparative metagenomics. Nucleic Acids Res., 40(Web Server issue), W88–95. View Article | PubMed | DOI | View Website
Wishart, D.S., Arndt, D., Berjanskii, M., Guo, A.C., Shi, Y., Shrivastava, S., Zhou, J., Zhou, Y. and Lin, G. (2008) PPT-DB: the protein property prediction and testing database. Nucleic Acids Res., 36(Database issue), D222–229. View Article | PubMed | DOI | View Website
Shi, Y., Zhou, J., Arndt, D., Wishart, D.S. and Lin, G. (2008) Protein contact order prediction from primary sequences. BMC Bioinformatics, 9, 255. View Article | PubMed | DOI
Wishart, D.S., Tzur, D., Knox, C., Eisner, R., Guo, A.C., Young, N., Cheng, D., Jewell, K., Arndt, D., Sawhney, S., et al. (2007) HMDB: the Human Metabolome Database. Nucleic Acids Res., 35(Database issue), D521–526. View Article | PubMed | DOI | View Website
Wishart, D.S., Yang, R., Arndt, D., Tang, P. and Cruz, J. (2005) Dynamic cellular automata: an alternative approach to cellular simulation. In Silico Biol., 5(2), 139–161. PubMed | View Website

Get in Touch

David Arndt

I'm always interested to get involved in new projects and to apply and expand my skills in new areas.

I have strengths in the following areas:

  • Working with bioinformatics data from a range of fields
  • Web app development (Ruby on Rails, Shiny, etc.)
  • Big data processing on cluster/cloud environments
  • Database design and implementation
  • Working collaboratively with partners and on small teams

I can be reached at [email protected]