Qizhen Zhang

I am an incoming Assistant Professor in Computer Science at the University of Toronto starting Fall 2023. I am broadly interested in data management and computer systems. My research has been bridging cloud data processing systems and data center networks to address emerging challenges in hyperscale data processing.
Before joining UofT, I am spending a year in industry at Microsoft Research, Redmond as a post-doc researcher in the data systems group.

Email, CV, Google Scholar, LinkedIn, Twitter

I am actively looking for students to work on cloud data systems, data center networking, and the interactions between ML and systems. If you are interested, please apply to UofT CS and mention my name. Feel free to drop me an email.

News

Oct. 2022TeShu is accepted at CIDR 2023
Aug. 2022I am now at Microsoft Research
Aug. 2022FlexChain is accepted at VLDB 2023
Aug. 2022I defended my dissertation and received my Ph.D.
Jun. 2022I will join the University of Toronto as an Assistant Professor in Fall 2023
Mar. 2022My work is recognized with Penn's best CS dissertation award (Rubinoff Award)
Dec. 2021TELEPORT is accepted at SIGMOD 2022 (happening in Philly next year!)
Oct. 2021Redy is accepted at VLDB 2022
Oct. 2021CompuCache is accepted at CIDR 2022
Apr. 2021MimicNet is accepted at SIGCOMM 2021
Apr. 2020Understanding DBMSs in DDCs is accepted at VLDB 2020
Oct. 2019Rethinking data processing systems in DDCs is accepted at CIDR 2020
Mar. 2019I am selected as the 2019-2020 Jonathan M. Smith Fellow
Nov. 2018GraphRex is accepted at SIGMOD 2019
Aug. 2017Predicting startup crowdfunding success is accepted at CIKM 2017
Jul. 2017Analyzing the performance and cost of graph systems is accepted at SoCC 2017
Aug. 2016I started Ph.D. studies at the University of Pennsylvania

Research

Today's largest data processing workloads are hosted in cloud data centers. Due to unprecedented data growth, these workloads have ballooned to hyperscale level, encompassing billions to trillions of data items and hundreds to thousands of machines per query. Enabling and expanding with these workloads are highly scalable data center networks that connect up to hundreds of thousands of networked servers. At hyperscale, the classic layered designs are no longer sustainable: without knowing how their data is transferred in the network, applications can make egregious decisions in executions; the cloud infrastructure also performs poorly without rethinking the interfaces and services exposed to its applications. Rather than optimize these massive layers in silos, I build systems across them with principled network-centric designs for efficient hyperscale data processing.

  • In current networks, we design data processing systems by adopting network awareness. Considering data center network characteristics in optimizing distributed operations effectively reduces communication cost at hyperscale. We built GraphRex [SIGMOD '19], the first system that systematically adopts network awareness for large-scale graph processing, and a data center network-aware DBMS query optimizer [manuscript under preparation].

I have also worked on other aspects of data processing, including data science application with machine learning [CIKM '17], data processing cost efficiency [SoCC '17], and fault-tolerant computing [manuscript under preparation]. Recently, I proposed CompuCache [CIDR '22], a new cloud service that exploits spot VMs for data caching and memory-intensive compute offloading.

Publications

Templating Shuffles
Qizhen Zhang, Jiacheng Wu, Ang Chen, Vincent Liu, Boon Thau Loo
Conference on Innovative Data Systems Research, CIDR 2023 [To appear]

FlexChain: An Elastic Disaggregated Blockchain
Chenyuan Wu, Mohammad Javad Amiri, Jared Asch, Heena Nagda, Qizhen Zhang, Boon Thau Loo
International Conference on Very Large Data Bases, VLDB 2023 [To appear]

Optimizing Data-intensive Systems in Disaggregated Data Centers with TELEPORT
Qizhen Zhang, Xinyi Chen, Sidharth Sankhe, Zhilei Zheng, Ke Zhong, Sebastian Angel, Ang Chen, Vincent Liu, Boon Thau Loo
ACM International Conference on Management of Data, SIGMOD 2022

Redy: Remote Dynamic Memory Cache
Qizhen Zhang, Philip Bernstein, Daniel Berger, Badrish Chandramouli
International Conference on Very Large Data Bases, VLDB 2022

CompuCache: Remote Computable Caching using Spot VMs
Qizhen Zhang, Philip Bernstein, Daniel Berger, Badrish Chandramouli, Vincent Liu, Boon Thau Loo
Conference on Innovative Data Systems Research, CIDR 2022

MimicNet: Fast Performance Estimates for Data Center Networks with Machine Learning
Qizhen Zhang, Kelvin K.W. Ng, Charles W. Kazer, Shen Yan, João Sedoc, Vincent Liu
Annual Conference of the ACM Special Interest Group on Data Communication, SIGCOMM 2021

Understanding the Effect of Data Center Resource Disaggregation on Production DBMSs
Qizhen Zhang, Yifan Cai, Xinyi Chen, Sebastian Angel, Ang Chen, Vincent Liu, Boon Thau Loo
International Conference on Very Large Data Bases, VLDB 2020

Rethinking Data Management Systems for Disaggregated Data Centers
Qizhen Zhang, Yifan Cai, Sebastian Angel, Ang Chen, Vincent Liu, Boon Thau Loo
Conference on Innovative Data Systems Research, CIDR 2020

Optimizing Declarative Graph Queries at Large Scale
Qizhen Zhang, Akash Acharya, Hongzhi Chen, Simran Arora, Ang Chen, Vincent Liu, Boon Thau Loo
ACM International Conference on Management of Data, SIGMOD 2019

Predicting Startup Crowdfunding Success through Longitudinal Social Engagement Analysis
Qizhen Zhang, Tengyuan Ye, Meryem Essaidi, Shivani Agarwal, Vincent Liu, Boon Thau Loo
ACM International Conference on Information and Knowledge Management, CIKM 2017

Architectural Implications on the Performance and Cost of Graph Analytics Systems
Qizhen Zhang, Hongzhi Chen, Da Yan, James Cheng, Boon Thau Loo, Purushotham Bangalore
ACM Symposium on Cloud Computing, SoCC 2017

Earlier Publications

Quegel: A General-Purpose System for Querying Big Graphs
Qizhen Zhang, Da Yan, James Cheng
Proceedings of the 2016 International Conference on Management of Data, SIGMOD 2016 Demo

A General-Purpose Query-Centric Framework for Querying Big Graphs
Da Yan, James Cheng, M. Tamer Özsu, Fan Yang, Yi Lu, John C. S. Lui, Qizhen Zhang, Wilfred Ng
International Conference on Very Large Data Bases, VLDB 2016

A Shapley Value Approach for Cost Allocation in the Cloud
Qizhen Zhang, Haoran Wang, Yang Chen, Tao Qin, Ying Yan, Thomas Moscibroda
ACM Symposium on Cloud Computing, SoCC 2015 Poster

Professional Activities

Industrial Experiences

Microsoft Research, Redmond, August 2022 - August 2023
Cloud data management

Microsoft Research, Redmond, Summer 2021
CompuCache: fast and cheap compute offloading for remote memory caching with spot VMs

Microsoft Research, Redmond, Summer 2020
Redy: high-performance RDMA-accessible caching with remote dynamic memory

Microsoft Research, Redmond, Summer 2019
Large-scale SQL query optimization with a focus on search completeness and efficiency

NEC Labs, America, Summer 2018
Software behavior analysis based on provenance graphs for anomaly detection

Microsoft Research Asia (awarded as excellent intern), September 2014 - June 2015
Resource cost allocation for multi-tenant clouds with game theory

Talks

Redy: Remote Dynamic Memory Cache
VLDB 2022, Virtual, September 2022

Optimizing Data-intensive Systems in Disaggregated Data Centers with TELEPORT
SIGMOD 2022, Philadelphia, Pennsylvania, United States, June 2022

Hyperscale Data Processing with Network-centric Designs
Invited Talk 2022
CMU, HKUST, NUS, Ohio State U., Simon Fraser U., UC Irvine, UIUC, U. Minnesota Twin Cities, U. Toronto, U. Virginia, U. Waterloo

CompuCache: Remote Computable Caching using Spot VMs
CIDR 2022, Virtual, January 2022

MimicNet: Fast Performance Estimates for Data Center Networks with Machine Learning
SIGCOMM 2021, Virtual, August 2021

Understanding the Effect of Data Center Resource Disaggregation on Production DBMSs
VLDB 2020, Virtual, September 2020
Microsoft Research, Virtual, August 2020

Rethinking Data Management Systems for Disaggregated Data Centers
CIDR 2020, Amsterdam, Netherlands, January 2020

Optimizing Declarative Graph Queries at Large Scale
Microsoft Research, Redmond, Washington, United States, August 2019
SIGMOD 2019, Amsterdam, Netherlands, July 2019

Predicting Startup Crowdfunding Success through Longitudinal Social Engagement Analysis
CIKM 2017, Singapore, November 2017

Architectural Implications on the Performance and Cost of Graph Analytics Systems
SoCC 2017, Santa Clara, California, United States, September 2017

Service

Program Committee: SIGMOD 2024, EDBT 2024, ICDE 2023, SoCC 2022, CIKM 2022

Session Chair: SoCC 2022, SIGMOD 2022

Journal Reviewer: IEEE/ACM Transactions on Networking, IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Knowledge and Data Engineering

Organizer: Penn DSL Seminar, Spring 2017 and Fall 2017

Teaching

Teaching Assistant for MCIT Online 595: Computer Systems Programming
University of Pennsylvania, Fall 2019

Course Designer and Teaching Assistant for MCIT Online 595: Computer Systems Programming
University of Pennsylvania, Spring 2019

Teaching Assistant for CIS 553: Networked Systems
University of Pennsylvania, Spring 2019

Teaching Assistant for CIS 553: Networked Systems
University of Pennsylvania, Spring 2018

Last edit in November, 2022