Friday, June 23 • 2:54pm - 3:06pm
JSeqArray: Data Manipulation of Whole-genome Sequencing Variants in Julia

Whole-genome sequencing (WGS) data is being generated at an unprecedented rate. Analysis of WGS data requires a flexible data format to store the different types of DNA variation. A new WGS variant data format “SeqArray” was proposed recently (Zheng X, etc, 2017 Bioinformatics), which outperforms text-based variant call format (VCF) in terms of access efficiency and file size. I introduce a new Julia package “JSeqArray” for data manipulation of genotypes and annotations in an array-oriented manner (https://github.com/CoreArray/JSeqArray.jl). It enables users to write portable and immediately usable code in the wider scientific ecosystem. When used in conjunction with the multiprocessing and job-oriented functions for parallel execution, the JSeqArray package provides users a flexible and high-performance programming environment for analysis of WGS variant data. In the presentation, the examples of calculating allele frequencies and principal component analysis will be given.

Xiuwen Zheng

Senior Research Fellow, University of Washington
He received a Ph.D. degree in Biostatistics at the University of Washington. He has been working on NHLBI Trans-Omics for Precision Medicine (TOPMed) whole-genome sequencing (WGS) project since Jun 2015. He developed the SeqArray package for data management of terabyte-sized sequ... Read More →

Friday June 23, 2017 2:54pm - 3:06pm
East Pauley Pauley Ballroom, Berkeley, CA

