Session 7-D

## Network Intelligence V

Conference
9:00 AM — 10:30 AM EDT
Local
Jul 9 Thu, 9:00 AM — 10:30 AM EDT

### Automating Cloud Deployment for Deep Learning Inference of Real-time Online Services

Yang Li (Tsinghua University, China); Zhenhua Han (University of Science and Technology of China, China); Quanlu Zhang (MSRA, China); Zhenhua Li (Tsinghua University, China); Haisheng Tan (University of Science and Technology of China, China)

2
Real-time online services using pre-trained deep neural network (DNN) models, e.g., Siri and Instagram, require low-latency and cost-efficiency for quality-of-service and commercial competitiveness. When deployed in a cloud environment, such services call for an appropriate selection of cloud configurations (i.e., specific types of VM instances), as well as a considerate device placement plan that places the operations of a DNN model to multiple computation devices like GPUs and CPUs. Currently, the deployment mainly relies on service providers' manual efforts, which is not only onerous but also far from satisfactory oftentimes (for a same service, a poor deployment can incur significantly more costs by tens of times). In this paper, we attempt to automate the cloud deployment for real-time online DNN inference with minimum costs under the constraint of acceptably low latency. This attempt is enabled by jointly leveraging the Bayesian Optimization and Deep Reinforcement Learning to adaptively unearth the (nearly) optimal cloud configuration and device placement with limited search time. We implement a prototype system of our solution based on TensorFlow and conduct extensive experiments on top of Microsoft Azure. The results show that our solution essentially outperforms the non-trivial baselines in terms of inference speed and cost-efficiency.

### Geryon: Accelerating Distributed CNN Training by Network-Level Flow Scheduling

Shuai Wang, Dan Li and Jinkun Geng (Tsinghua University, China)

0
Increasingly rich data sets and complicated models make distributed machine learning more and more important. However, the cost of extensive and frequent parameter synchronizations can easily diminish the benefits of distributed training across multiple machines. In this paper, we present Geryon, a network-level flow scheduling scheme to accelerate distributed Convolutional Neural Network (CNN) training. Geryon leverages multiple flows with different priorities to transfer parameters of different urgency levels, which can naturally coordinate multiple parameter servers and prioritize the urgent parameter transfers in the entire network fabric. Geryon requires no modification in CNN models and does not affect the training accuracy. Based on the experimental results of four representative CNN models on a testbed of 8 GPU (NVIDIA K40) servers, Geryon achieves up to 95.7% scaling efficiency even with 10GbE bandwidth. In contrast, for most models, the scaling efficiency of vanilla TensorFlow is no more than 37% and that of TensorFlow with parameter partition and slicing is around 80%. In terms of training throughput, Geryon enhanced with parameter partition and slicing achieves up to 4.37x speedup, where the flow scheduling algorithm itself achieves up to 1.2x speedup over parameter partition and slicing.

### Neural Tensor Completion for Accurate Network Monitoring

Kun Xie (Hunan University, USA); Huali Lu (Hunan University, China); Xin Wang (Stony Brook University, USA); Gaogang Xie (Institute of Computing Technology, Chinese Academy of Sciences, China); Yong Ding (Guilin University of Electronic Technology, China); Dongliang Xie (State University of New York at Stony Brook, USA); Jigang Wen (Chinese Academy of Science & Institute of Computing Technology, China); Dafang Zhang (Hunan University, China)

0
Monitoring the performance of a large network is very costly. Instead, a subset of paths or time intervals of the network can be measured while inferring the remaining network data by leveraging their spatio-temporal correlations. The quality of missing data recovery highly relies on the inference algorithms. Tensor completion has attracted some recent attentions with its capability of exploiting the multi-dimensional data structure for more accurate missing data inference. However, current tensor completion algorithms only model the three-order interaction of data features through the inner product, which is insufficient to capture the high-order, nonlinear correlations across different feature dimensions. In this paper, we propose a novel Neural Tensor Completion (NTC) scheme to effectively model three-order interaction among data features with the outer product and build a 3D interaction map. Based on which, we apply 3D convolution to learn features of high-order interaction from the local range to the global range. We demonstrate theoretically this will lead to good learning ability. We further conduct extensive experiments on two real-world network monitoring datasets, Abilene and WS-DREAM, to demonstrate that NTC can significantly reduce the error in missing data recovery.

### Optimizing Federated Learning on Non-IID Data with Reinforcement Learning

Hao Wang and Zakhary Kaplan (University of Toronto, Canada); Di Niu (University of Alberta, Canada); Baochun Li (University of Toronto, Canada)

6
In this paper, we propose Favor, an experience-driven federated learning framework that actively selects client devices for training to deal with non-IID data. With both empirical studies and mathematical analysis, we found an implicit connection between the distribution of training data and the weights of model trained on the data. Favor is able to profile data distribution on each device using this implicit connection, without access to the raw data. In Favor, we propose a new mechanism based on reinforcement learning that learns to construct a specific subset of client devices in each communication round. Updated with aggregation of model weights generated by this subset of devices, the global model obtained using federated learning can counterbalance the bias introduced by non-IID data. With our extensive array of experiments using PyTorch, our experimental results show that communication rounds can be reduced the number of communication rounds by up to 49% on the MNIST, up to 23% on FashionMNIST and up to 42% on CIFAR-10, as compared to the Federated Averaging algorithm.

###### Session Chair

Ruidong Li (National Institute of Information and Communications Technology (NICT))

Session 8-D

## Video Streaming

Conference
11:00 AM — 12:30 PM EDT
Local
Jul 9 Thu, 11:00 AM — 12:30 PM EDT

### FastVA: Deep Learning Video Analytics Through Edge Processing and NPU in Mobile

Tianxiang Tan and Guohong Cao (The Pennsylvania State University, USA)

2
Many mobile applications have been developed to apply deep learning for video analytics. Although these advanced deep learning models can provide us with better results, they also suffer from the high computational overhead which means longer delay and more energy consumption when running on mobile devices. To address this issue, we propose a framework called FastVA, which supports deep learning video analytics through edge processing and Neural Processing Unit (NPU) in mobile. The major challenge is to determine when to offload the computation and when to use NPU. Based on the processing time and accuracy requirement of the mobile application, we study two problems: Max-Accuracy where the goal is to maximize the accuracy under some time constraints, and Max-Utility where the goal is to maximize the utility which is a weighted function of processing time and accuracy. We formulate them as integer programming problems and propose heuristics based solutions. We have implemented FastVA on smartphones and demonstrated its effectiveness through extensive evaluations.

### Improving Quality of Experience by Adaptive Video Streaming with Super-Resolution

Yinjie Zhang (Peking University, China); Yuanxing Zhang (School of EECS, Peking University, China); Yi Wu, Yu Tao and Kaigui Bian (Peking University, China); Pan Zhou (Huazhong University of Science and Technology, China); Lingyang Song (Peking University, China); Hu Tuo (IQIYI Science & Technology Co., Ltd., China)

3
Given high-speed mobile Internet access today, audiences are expecting much higher video quality than before. Video service providers have deployed dynamic video bitrate adaptation services to fulfill such user demands. However, legacy video bitrate adaptation techniques are highly dependent on the estimation of dynamic bandwidth, and fail to integrate the video quality enhancement techniques, or consider the heterogeneous computing capabilities of client devices, leading to low quality of experience (QoE) for users. In this paper, we present a super-resolution based adaptive video streaming (SRAVS) framework, which applies a Reinforcement Learning (RL) model for integrating the video super-resolution (VSR) technique with the video streaming strategy. The VSR technique allows clients to download low bitrate video segments, reconstruct and enhance them to high-quality video segments while making the system less dependent on estimating dynamic bandwidth. The RL model investigates both the playback statistics and the distinguishing features related to the client-side computing capabilities. Trace-driven emulations over real-world videos and bandwidth traces verify that SRAVS can significantly improve the QoE for users compared to the state-of-the-art video streaming strategies with or without involving VSR techniques.

### Stick: A Harmonious Fusion of Buffer-based and Learning-based Approach for Adaptive Streaming

Tianchi Huang (Tsinghua University, China); Chao Zhou (Beijing Kuaishou Technology Co., Ltd, China); Rui-Xiao Zhang, Chenglei Wu, Xin Yao and Lifeng Sun (Tsinghua University, China)

3
Existing off-the-shelf buffer-based approaches leverage a simple yet effective buffer-bound to control the adaptive bitrate~(ABR) streaming system. Nevertheless, such approaches with standard parameters fail to provide high quality of experience~(QoE) video streaming under all considered network conditions. Meanwhile, state-of-the-art learning-based ABR approach Pensieve outperforms existing schemes but is impractical to deploy. Therefore, how to harmoniously fuse the buffer-based and learning-based approach has become a key challenge for further enhancing ABR methods. In this paper, we propose \emph{Stick}, an ABR algorithm that fuses the deep learning method and traditional buffer-based method. Stick utilizes deep reinforcement learning~(DRL) method to train the neural network, which outputs the \emph{buffer-bound} to control the buffer-based approach for maximizing the QoE metric with different parameters. Trace-driven emulation illustrates that Stick betters Pensieve by 9.41% with an overhead reduction of 88%. Moreover, aiming to further reduce the computational costs while preserving the performances, we propose Trigger, a light-weighted neural network that \emph{determines} whether the buffer-bound should be adjusted. Experimental results show that Stick+Trigger rivals or outperforms existing schemes in average QoE by 1.7%-28%, and significantly reduces the Stick's computational overhead by 24%-61%. Extensive results on real-world evaluation also demonstrate the superiority of Stick over existing state-of-the-art approaches.

### Streaming 360◦ Videos using Super-resolution

Mallesham Dasari (Stony Brook University, USA); Arani Bhattacharya (KTH Royal Institute of Technology, Sweden); Santiago Vargas, Pranjal Sahu, Aruna Balasubramanian and Samir R. Das (Stony Brook University, USA)

3
360 videos provide an immersive experience to users, but require considerably more bandwidth to stream compared to regular videos. State-of-the-art 360◦ video streaming systems use viewport prediction to reduce bandwidth requirement, that involves predicting which part of the video the user will view and only fetching that content. However, viewport prediction is error prone resulting in poor user QoE. We design PARSEC, a 360 video streaming system that reduces bandwidth requirement while improving video quality. PARSEC trades off bandwidth for more client compute to achieve its goals. PARSEC uses a compression technique based on super resolution, where the video is significantly compressed at the server and the client runs a deep learning model to enhance the video to a much higher quality. PARSEC addresses a set of challenges associated with using super resolution for 360 video streaming: large deep learning models, high inference latency, and variance in the quality of the enhanced videos. To this end, PARSEC trains small micro-models over shorter video segments, and then combines traditional video encoding with super resolution techniques to overcome the challenges. We evaluate PARSEC on a real WiFi network, over a broadband network trace released by FCC, and over a 4G/LTE network trace.

###### Session Chair

Zhenhua Li (Tsinghua University)

Session 9-D

## Privacy II

Conference
2:00 PM — 3:30 PM EDT
Local
Jul 9 Thu, 2:00 PM — 3:30 PM EDT

### Analysis, Modeling, and Implementation of Publisher-side Ad Request Filtering

Liang Lv (Tsinghua, China); Ke Xu (Tsinghua University, China); Haiyang Wang (University of Minnesota at Duluth, USA); Meng Shen (Beijing Institute of Technology, China); Yi Zhao (Tsinghua University, China); Minghui Li, Guanhui Geng and Zhichao Liu (Baidu, China)

2
Online advertising has been a great driving force for the Internet industry. To maintain a steady growth of advertising revenue, advertisement (ad) publishers have made great efforts to increase the impressions as well as the conversion rate. However, we notice that the results of these efforts are not as good as expected. In detail, to show more ads to the consumers, publishers have to waste a significant amount of server resources to process the ad requests that do not result in consumers' clicks. On the other hand, the increasing ads are also reducing the browsing experience of the consumers.

In this paper, we explore the opportunity to improve publishers' overall utility by handling a selective number of requests on ad servers. Particularly, we propose a publisher-side proactive ad request filtration solution Win2. Upon receiving an ad request, Win2 estimates the probability that the consumer will click if serving it. The ad request will be served if the clicking probability is above a dynamic threshold. Otherwise, it will be filtered to reduce the publisher's resource cost and improve consumer experience. We implement Win2 in Baidu's large-scale ad serving system and the evaluation results confirm its effectiveness.

### Differentially Private Range Counting in Planar Graphs for Spatial Sensing

Abhirup Ghosh (Imperial College London, United Kingdom (Great Britain)); Jiaxin Ding (Shanghai Jiao Tong University, China); Rik Sarkar (University of Edinburgh, United Kingdom (Great Britain)); Jie Gao (Rutgers University, USA)

1
This paper considers the problem of privately reporting counts of events recorded by devices in different regions of the plane. Unlike previous range query methods, our approach is not limited to rectangular ranges. We devise novel hierarchical data structures to answer queries over arbitrary planar graphs. This construction relies on balanced planar separators to represent shortest paths using $$O(\log n)$$ number of canonical paths. Pre-computed sums along these canonical paths allow efficient computations of 1D counting range queries along any shortest path. We make use of differential forms together with the 1D mechanism to answer 2D queries in which a range is a union of faces in the planar graph. The methods are designed such that the range queries could be answered with differential privacy guarantee on any single event, with only a poly-logarithmic error. They also allow private range queries to be performed in a distributed setup. Experimental results confirm that the methods are efficient and accurate on real data.

### Message Type Identification of Binary Network Protocols using Continuous Segment Similarity

Stephan Kleber, Rens Wouter van der Heijden and Frank Kargl (Ulm University, Germany)

1
Protocol reverse engineering based on traffic traces infers the behavior of unknown network protocols by analyzing observable network messages. To perform correct deduction of message semantics or behavior analysis, accurate message type identification is an essential first step. However, identifying message types is particularly difficult for binary protocols, whose structural features are hidden in their densely packed data representation. In this paper, we leverage the intrinsic structural features of binary protocols and propose an accurate method for discriminating message types. Our approach uses a continuous similarity measure by comparing feature vectors where vector elements correspond to the fields in a message, rather than discrete byte values. This enables a better recognition of structural patterns, which remain hidden when only exact value matches are considered. We combine Hirschberg alignment with DBSCAN as cluster algorithm to yield a novel inference mechanism. By applying novel autoconfiguration schemes, we do not require manually configured parameters for the analysis of an unknown protocol, as required by earlier approaches. Results of our evaluations show that our approach has considerable advantages in message type identification result quality but also execution performance over previous approaches.

### Search Me in the Dark: Privacy-preserving Boolean Range Query over Encrypted Spatial Data

Xiangyu Wang and Jianfeng Ma (Xidian University, China); Ximeng Liu (Fuzhou University, China); Robert Deng (Singapore Management University, Singapore); Yinbin Miao, Dan Zhu and Zhuoran Ma (Xidian University, China)

1
With the increasing popularity of geo-positioning technologies and mobile Internet, spatial keyword data services have attracted growing interest from both the industrial and academic communities in recent years. Meanwhile, massive amount of data is increasingly being outsourced to cloud in the encrypted form for enjoying the advantages of cloud computing while without compromising data privacy. Most existing works primarily focus on the privacy-preserving schemes for either spatial or keyword queries, and they cannot be directly applied to solve the spatial keyword query problem over encrypted data. In this paper, for the first time, we study the challenging problem of Privacy-preserving Boolean Range Query (PBRQ) over encrypted spatial databases. In particular, we propose two novel PBRQ schemes. Firstly, we present a scheme with linear search complexity based on the space-filling curve code and Symmetric-key Hidden Vector Encryption (SHVE). Then, we use tree structures to achieve faster-than-linear search complexity. Thorough security analysis shows that data security and query privacy can be guaranteed during the query process. Experimental results using real-world datasets show that the proposed schemes are efficient and feasible for practical applications, which is at least 70 X faster than existing techniques in the literature.

###### Session Chair

Yaling Yang (Virginia Tech)

Session 10-D

## Network Intelligence VI

Conference
4:00 PM — 5:30 PM EDT
Local
Jul 9 Thu, 4:00 PM — 5:30 PM EDT

### An Incentive Mechanism Design for Efficient Edge Learning by Deep Reinforcement Learning Approach

Yufeng Zhan (The Hong Kong Polytechnic University, China); Jiang Zhang (University of Southern California, USA)

1
Emerging technologies and applications have generated large amounts of data at the network edge. Due to bandwidth, storage, and privacy concerns, it is often impractical to move the collected data to the cloud. With the rapid development of edge computing and distributed machine learning (ML), edge-based ML called federated learning has emerged to overcome the shortcomings of cloud-based ML. Existing works mainly focus on designing efficient learning algorithms, few works focus on designing the incentive mechanisms with heterogeneous edge nodes (EN) and uncertainty of network bandwidth. The incentive mechanisms affect various tradeoffs: (i) between computation and communication latencies, and thus (ii) between the edge learning time and payment consumption. We fill this gap by designing an incentive mechanism that captures the tradeoff between latencies and payment. Due to the network dynamics and privacy protection, we propose a deep reinforcement learning-based (DRL-based) solution that can automatically learn the best pricing strategy. To the best of our knowledge, this is the first work that applies the advances of DRL to design the incentive mechanism for edge learning. We evaluate the performance of the incentive mechanism using trace-driven experiments. The results demonstrate the superiority of our proposed approach as compared with the baselines.

### Intelligent Video Caching at Network Edge: A Multi-Agent Deep Reinforcement Learning Approach

Fangxin Wang (Simon Fraser University, Canada); Feng Wang (University of Mississippi, USA); Jiangchuan Liu and Ryan Shea (Simon Fraser University, Canada); Lifeng Sun (Tsinghua University, China)

1
Today's explosively growing Internet video traffics and viewers' ever-increasing quality of experience (QoE) demands for video streaming bring tremendous pressures to the backbone network. Mobile edge caching provides a promising alternative by pushing video content closer at the network edge rather than the remote CDN servers. However, our large-scale trace analysis shows that edge caching environment is much more complicated with massively dynamic and diverse request patterns, which renders that existing rule-based and model-based caching solutions may not well fit such complicated edge environments. Our trace analysis also shows that the request similarity among neighboring edges can be highly dynamic and diverse, which can easily compromise the benefits from traditional cooperative caching mostly designed based on CDN environment. In this paper, we propose \texttt{MacoCache}, an intelligent edge caching framework that is carefully designed to afford the massively diversified and distributed caching environment to minimize both content access latency and traffic cost. Specifically, MacoCache leverages a multi-agent deep reinforcement learning (MADRL) based solution, where each edge is able to adaptively learn its own best policy in conjunction with other edges for intelligent caching. The real trace-driven evaluation further demonstrate its superiority.

### Network-Aware Optimization of Distributed Learning for Fog Computing

Yuwei Tu (Zoomi Inc., USA); Yichen Ruan and Satyavrat Wagle (Carnegie Mellon University, USA); Christopher G. Brinton (Purdue University & Zoomi Inc., USA); Carlee Joe-Wong (Carnegie Mellon University, USA)

0
Fog computing holds promise of scaling machine learning tasks to network-generated datasets by distributing processing across connected devices. Key challenges to doing so, however, are heterogeneity in devices' compute resources and topology constraints on which devices can communicate. We are the first to address these challenges by developing a network-aware distributed learning optimization methodology where devices process data for a task locally and send their learnt parameters to a server for aggregation at certain time intervals. In particular, different from traditional federated learning frameworks, our method enables devices to offload their data processing tasks, with these decisions determined through a convex data transfer optimization problem which trades off costs associated with devices processing, offloading, or discarding data points. Using this model, we analytically characterize the optimal data transfer solution for different fog network topologies, showing for example that the value of a device offloading is approximately linear in the range of computing costs in the network. Our subsequent experiments on both synthetic and real-world datasets we collect confirm that our algorithms are able to improve network resource utilization substantially without sacrificing the accuracy of the learned model.

### SurveilEdge: Real-time Video Query based on Collaborative Cloud-Edge Deep Learning

Shibo Wang and Shusen Yang (Xi'an Jiaotong University, China); Cong Zhao (Imperial College London, United Kingdom (Great Britain))

0
Large volumes of surveillance videos are continuously generated by ubiquitous cameras, which brings the demand of real-time queries that return the video frames with objects of certain classes with low latency and low bandwidth cost. We present SurveilEdge, a collaborative Cloud-Edge system for real-time queries of large-scale surveillance video streams. Specifically, a (convolution neural network) CNN training scheme based on the fine-tuning the technique, and an intelligent task allocator with the task scheduling and parameter adjustment algorithm are developed. We implement SurveilEdge on a prototype with multiple edge devices and a public cloud, and conduct extensive experiments based on real-world surveillance video datasets. Experimental results demonstrate that the Cloud-edge collaborative SurveilEdge manages to reduce the average latency and bandwidth cost by up to 81.7% and 86.2% (100% and 200%) respectively, traditional cloud-based (edge-based) solutions. Meanwhile, SurveilEdge balances the computing load effectively and significantly reduces the variance of per frame query latencies.

###### Session Chair

Onur Altintas (Toyota Motor North America, R&D InfoTech Labs)