The Ficklin research program in the Dept. of Horticulture at Washington State University is a computational dry lab dedicated to the creation of software tools, computational approaches and systems-level models that address basic and applied hypothesis at the molecular-level of agricultural systems.
Areas of Focus
Biosignature discovery: identification of dynamic molecular markers (i.e., gene expression, metabolite abundance) for environmentally controlled traits in plants using machine learning.
Machine-learning and image processing for automated physiological trait rating in horticultural crops.
Systems genetics: using multiomic networks to link the molecular underpinnings to traits of interest.
Community biological database development using the Tripal software.
Whole genome assembly and annotation.
Stephen P. Ficklin, Ph.D.
Associate Professor, Dept of Horticulture, Washington State University
Department of Horticulture
Washington State University
PO Box 646414
Pullman, WA 99164-6414
Educational Background
Ph.D. Plant and Environmental Sciences, Clemson University (2013)
M.S. Computer Science, Clemson University (2003)
Projects
Summary Measures of Health for Dairy Cattle
Project Dates:
to
Funded by the USDA NIFA IDEAS program, this projects seeks to create a time-based summary measure of dairy cow health using transcriptomics, the microbiome and trait data that can be used to assess the comparative importance of diseases and injuries affecting animal wellbeing and economic losses across dairy populations.
NRSP10: National Database Resources for Crop Genomics, Genetics and Breeding Research
Project Dates:
to
NRSP10 (https://www.nrsp10.org/) is one of seven National Research Support Project (NRSP) funded by the State Agricultural Experiment Stations (SAES) from the Hatch Multistate Research Fund (MRF) provided by the National Institute for Food and Agriculture (NIFA). The mission is to establish a robust, dynamic, and widely available genomics, genetics and breeding online database platform as a resource for crops of national significance that are currently underserved (Citrus, Cool Season Food Legumes, Cotton, Rosaceae, and Vaccinium), that is flexible enough to be readily implemented for other crops and organisms valuable to U.S. agriculture. The role of our program is to provide core development support and outreach for Tripal (http://tripal.info)
Analysis of the Antagonistic and Mutualistic Interactions Within Potato, Protist & Virus
Project Dates:
to
Awarded jointly by the NSF and USDA , this project seeks to explore the mutualistic relationship between the soil borne Spongospora subterranea f. sp. subterranea (a protist parasite), and the potato mop-top virus (PMTV) as they antagonistically interact with potato plants. A systems-level time-series analysis will be performed to identify candidate gene sets that underlie disease susceptibility, resistance and mutualism.
Assessment of smoke taint risk in vineyards exposed to smoke from wildfires
Project Dates:
to
Funded by the Washington State Department of Agriculture Specialty Block Program, this project addresses the grape and wine industry's need for methods that assess the risk to grape and wine quality associated with vineyard exposure to smoke from wildfires.
Apple genomes for postharvest fruit quality biomarkers
Project Dates:
to
This project funded by the Washington Tree Fruit Research Commission seeks to develop tools for identification of postharvest biomarkers in apple fruit that assess response to storage conditions and predict risk for disorders or loss of quality.
"Big Data" Tree Crop Cyberinfrastructure
Project Dates:
to
Standards and Cyberinfrastructure that Enable "Big-Data" Driven Discovery for Tree Crop Research is a project funded by the US National Science Foundation (award #1444573) to develop standards and infrastucture for the integration of high quality, curated, phenotypic and genotypic data with geo-location and environmental data. This project will both leverage and coordinate funded efforts to enhance or update tree crop databases (Genome Database for Rosaceae, Citrus Genome Database, TreeGene and Hardwood Genomics Web) to Tripal that will support cross-site communication, adoption of existing standards, and "big data" integration and analysis.
Precision Dairying: Transcriptomics/Phenomics Pilot Project
SciDAS is a multi-institutional project funded by the National Science Foundation (award #1659300). The goal for SciDAS is to provide advanced cyberinfrastructure to support the creation of a National-level distributed compute infrastructure for the efficient injection of data and workflows compute environments. The Ficklin Lab is responsible for working with the project team to develop a Systems-Biology use case for large-scale development of gene co-expression networks across the tree of life. The project also contains a Tripal component to integrate Tripal sites with the SciDAS infrastructure. See the official SciDAS home page for more information.
The Tripal Gateway Project is a US National Science Foundation (NSF) Funded (award #1443040) project designed to create infrastructure to support two important needs within the Tripal community: data exchange and big data analysis. Modern sequencing technologies have expanded the need for workflow-based analytics to meet the demands of community expectations. The ability to move data between the community database and the high performance computing cluster is critical for meeting performance expectations. The Tripal Gateway Project attempts to meet these needs through the addition of RESTful web services to Tripal, second, integration of Tripal with Galaxy such that Tripal sites can provide analytical workflows to their users, and third development and exploration of methods to improve data transfer between Tripal sites and computing centers where Galaxy jobs are executed.
People
The Ficklin Lab comprises full time technical and research staff, postdocs, graduate students and undergraduate students. Current members of the Ficklin Lab are listed below in alphabetical order.
Active
P. Layton Ashmore
Postdoctoral Researcher
Focus Areas
Application of data science to identify biomarkers from large untargeted masspec datasets for wildfire smoke-related compounds in wine grapes. Co-advised by Tom Collins.
Aden Athar
Undergraduate Researcher
Focus Areas
Image analysis using Python and machine learning.
Zach Hall
Undergraduate Researcher
Focus Areas
Data analytics, biomarker development
Bianca Ortiz-Uriarte
Laboratory Technician
Focus Areas
Molecualr biology laboratory and postharvest physiology
José Luis Perez-Olmos
Undergraduate Researcher
Focus Areas
RNA-seq library preparation in support of the smoke-tained wine project.
Joel Alejandro Velasco
Horticulture PhD Student
Focus Areas
Identification of genes underlying root rot (Aphanomyces) resistance in lentils
Huiting Zhang
Research Assistant Professor
Focus Areas
Use of functional genomics and bioinformatics to explore the genetic control of post-harvest fruit quality of pome fruits. Works in both the Ficklin and Honnas research programs.
Alumni
Tyler Biggs
Postdoctoral Researcher
Focus Areas
Python development for Pynome, GSForge, and workflow development of massive gene co-expression network construction with SciDAS project. Data analysis, Machine Learning, High Performance Computing, Ph.D. in Organic Chemistry
Sean Buehler
Scientific Application Web Developer
Focus Areas
Tripal v3 and v4 Core Development and Tripal Help Desk support.
Josh Burns
Research Associate
Focus Areas
Software developer for ACE & KINC Using C++, OpenCL, QT and OpenMPI; GPU optimzation. Development of the AnnoTater workflow for execution on Kuberntes clusters.
Mitchell Greer
Undergraduate in EECS
Focus Areas
C++, CUDA, OpenCL Developer. Assisted in Development of KINC.
John Hadish
MPS Ph.D. Graduate 8/2023
Focus Areas
Exploration of improved computational methods towards development of biosignatures for post-harvest fruit quality in apples. Lead developer of GEMmaker.
Abdur-Rahman Muhammad
Undergraduate Researcher
Focus Areas
Developer of Granny, a machine learning tools for pome fruit trait ratings.
Matt McGowan
MPS Ph.D. Graduate 5/2022
Focus Areas
Noise reduction strategies, Network & GWAS integration, condition-specific subnetworks. Works in both the Ficklin and Zhang research programs.
Nhan Nguyen
Machine Learning Software Development
Focus Areas
Lead Developer of Granny, a machine learning tools for pome fruit trait ratings.
Sai Oruganti Sai Prakash
Former Hort Ph.D. Student
Focus Areas
Top-down metabolic networks construction for identification of condition-specific interactions and integration with gene expression data.
Risharde Ramnath
Scientific Application Web Developer
Focus Areas
Tripal v3 and v4 Core Development and Tripal Help Desk support.
Yue Shang
Hort MS Graduate 5/2023
Focus Areas
Researches chemical composition changes in smoke tainted grape and wine using GC-MS and Q-TOF. Works in the Collins lab, co-advised in the Ficklin lab.
The Ficklin Lab in the Department of Horticulture at WSU began in July of 2015. The following is a list of peer-reviewed publications with lab members as primary or as co-author since 2015.
Yocca A, Akinyuwa M, Bailey N, Cliver B, Estes H, Guillemette A, Hasannin O, Hutchison J, Jenkins W, Kaur I, Khanna RR, Loftin M, Lopes L, Moore-Pollard E, Olofintila O, Oyebode GO, Patel J, Thapa P, Waldinger M, Zhang J, Zhang Q, Goertzen L, Carey SB, Hargarten H, Mattheis J, Zhang H, Jones T, Boston L, Grimwood J, Ficklin S, Honaas L, Harkess A. A chromosome-scale assembly for 'd'Anjou' pear (2024) (2024) G3: Genes, Genomes, Genetics. Jan 8:jkae003
Zhang H, Ko I, Eaker A, Haney S, Khuu N, Ryan K, Appleby AB, Hoffman B, Landis H, Pierro K, Wilsea N, Hargarten H, Yocca A, Harkess A, Honaas L, Ficklin SP. A Phased, Chromosome-scale Genome for Malus domestica ‘WA 38’ (2024) G3:Genes|Genomes|Genetic. Sept 17:jkae222
The Ficklin lab actively develops software that implements new approaches for Systems Genetics and the Tripal database platform. A list of these software packages is provided below.
ACE
The Accelerated Computational Engine (ACE) is a C++ library that provides a generic interface for construction of analytical tools. It provides a common interface for GPU utilization, visualization using the Qt package, and multi-node execution using OpenMPI. ACE provides an open file format for all output files that supports meta-data and provenance. ACE was created as the base for KINC, but can be used for any scientific application.
blend4php is a PHP library that interacts directly with the Galaxy Project API. This tools was developed for use by the Tripal Galaxy Module, but was designed to be independent to allow anyone with a PHP-based site to directly interact with workflows housed in Galaxy. The blend4php package will allow a site to add, modify and launch workflows, view and download histories, create datasets and more.
FUNC-E provides a DAVID-style command-line tool for functional enrichment of gene sets. It performs Fisher's test, multiple-testing correction, and KAPPA statistics for term clustering. FUNC-E allows a user to provide their own genome background and annotation sets.
GEMmaker is a Nextflow workflow for large-scale gene expression sample processing, expression-level quantification and Gene Expression Matrix (GEM) construction. Results from GEMmaker are useful for differential gene expression (DGE) and gene co-expression network (GCN) analyses. The GEMmaker workflow currently supports Illumina RNA-seq datasets.
GSForge is a Python software package that assists researchers through use of data management, visualization and machine learning approaches in the selection of gene sets with potential association to an experimental condition or phenotypic trait, which offers new potential hypotheses for gene-trait causality.
The Knowledge Independent Network Construction (KINC) package generates gene co-expression networks using Pearson and Spearman and Mutual Information, employs Random Matrix Theory (RMT) for automated network thresholding and optionally employs Gaussian Mixture Models (GMMs) to identify potential condition-specific gene expression. KINC v3.0 is built off of the Accelerated Computing Engine (ACE)--another Ficklin Lab software product.
Pynome is a product of the NSF-funded SciDAS project It is used to automate retrieval and preparation of whole genome sequences for a variety of Eukaryotic species. Pynome integrates with iRODs to prepare large-scale genomic analyticsl workflows.
Tripal is a toolkit for construction of online biological (genetics, genomics, breeding, etc), community database, and is a member of the GMOD family of tools. Tripal v3 provides by default integration with the GMOD Chado database. Tripal is used by species and clade genome databases all over the world and boasts an active distributed community of open-source developers.
The Tripal Galaxy Module is an extension module for Tripal that integrates a Tripal-based site with the Galaxy Workflow tool. It allows a site to provide workflows to end-users and for site developers to use Galaxy workflows to power computation of complex analytical tools.
The following courses are offered to graduate-level students by the Ficklin Lab
AFS 505: Topics in Computing and Analytical Methods for Scientists
Formerly a Horticulture 503 (Special Topics) course, this course offers:
Applied computational methods for researchers processing, managing, and analyzing data in scientific and engineering fields.
Variable-credit (1-6) course with 5-weeks per module and 1 credit per module.
Select from non-sequential modules to meet program needs.
General prerequisite is graduate standing in an agricultural, life environmental or economic science, or engineering. Other recommended preparation specific to individual modules.
Modules offered in the Fall
Data Structures in R
Data Visualization in R
Data Wrangling in R
Instructor: David Brown, Ph.D.
Modules offered in the Spring
Programming in Python
Data Analysis with Python
Computing for Big Data
Instructor: Stephen Ficklin, Ph.D.
Semesters Taught:
AFS 505 Units 1-3 Spring 2020
Hort 503 (Advanced Topics), Section 1 Spring 2019
Hort 503 (Advanced Topics), Section 1 Spring 2018
Data Analysis in Systems Biology
This course offers an introduction to approaches for modeling and analysis for systems biology. Topics include
Review of gene, protein, metabolic, and signaling systems
Methods for modeling biological systems
UNIX Basics
High Performance Computing (HPC) introduction
Graph theory for network modeling
Network visualization
Throughout the course students work towards the generation of gene co-expression networks from RNA-seq data they select for organisms and biological functions of their own interest. These networks are constructed using HPC and existing bioinformatics tools.
Semesters Taught:
Hort 503 (Advanced Topics), Section 2 Fall 2019
Hort 503 (Advanced Topics), Section 2 Fall 2017
Hort 503 (Advanced Topics), Section 2 Fall 2016
Join the Lab
Graduate Studies
Graduate degrees with an emphasis on Systems Genetics and Computational Biology are available with the Ficklin lab through the Department of Horticulture and the Molecular Plant Sciences (MPS) program. Both programs offer world-class graduate-level education. Dr. Ficklin is currently looking for students interested in graduate research both at the M.S. (Horticulture) and Ph.D. levels (Horticulture and MPS). Please contact Dr. Ficklin directly to express interest.
Undergraduate Research
Undergraduate research opportunities are available for motivated students with background in computer programming. If interested, please contact Dr. Ficklin directly.
Research Staff / Postdoctoral Researchers
The Ficklin lab offers full time employment as needed by funded projects for data scientists and software developers. At times, positions are available for Research Associates (with a B.S. or M.S. degree and relevant experience) and Postoctoral Researchers. When available, these positions are posted online at WSU's career website. If you are looking to apply for an existing opening please use that site to apply. If you would like to inquire about potential employment, please contact Dr. Ficklin directly.