Professor of Department of Computer Science and an Associate Director of Institute of Computational and Theoretical Studies
Hong Kong Baptist University
The explosion of digital data created by mobile sensors, social media, surveillance, medical imaging, smart grids and the like-combined with new tools for analyzing it all-has brought us a Big Data era. We are facing the great challenges: how to deal with data which is more than we could actually understand and absorb and how to make efficient use of the huge volume of data? From both scientific and practical perspectives, research on "Data Science" goes beyond the contents of Big Data. Data Science can be generally regarded as an interdisciplinary field of using mathematics, statistics, databases, data mining, high-performance computing, knowledge management and virtualization to discover knowledge from data. It should have its own scientific contents, such as axioms, laws and rules, which are fundamentally important for experts in different fields to explore their own interests from data. A Blockchain is a secured, shared and distributed ledger that facilitates the process of recording and tracking resources without the need of a centralized trusted authority. The technology is scalable and robust and all participant nodes provide resources in a fair manner, which alleviates many-to-one traffic flow bottlenecks.
The "International symposium/workshop on Dataology and Data Science" has been a platform for researchers from data and some practitioners from industry and government to share their ideas, research results and experiences on studying of data. From 2010 to 2013, it has been annually held in China where more than 300 scholars and industrial professionals from Australia, Canada, China, Japan, UK and USA attended.
Started from 2014, this platform has been transferred as the annual International Conference on Data Science (ICDS) in order to further expand the preliminary findings and exchanges on Data Science. The last ICDS series were held at Beijing, China (ICDS 2014), Sydney, Australia (ICDS 2015), Xian, China (ICDS 2016), Shanghai, China (ICDS 2017), Beijing, China (ICDS 2018), Ningbo, China (ICDS 2019). ICDS 2020 will be held at Chengdu, China in Dec 26-27, 2020. Its theme will be: "Advancement of Data Science and Blockchain". The main topics, but not limited to, are as follows:
We will invite well-known international scholars and professionals in various related fields, both natural and social sciences, to join us for the development of Data Science at this conference and so on to fully explore methodologies on Data Science from different research aspects.
Title: Objective-Domain Dual Decomposition: An Effective Approach to Optimizing Partially
Differentiable Objective Functions
Abstract:
This paper addresses a class of optimization problems in which either part of the objective function is
differentiable while the rest is nondifferentiable or the objective function is differentiable in only part of
the domain. Accordingly, we propose a dual-decomposition-based approach that includes both objective
decomposition and domain decomposition. In the former, the original objective function is decomposed
into several relatively simple subobjectives to isolate the nondifferentiable part of the objective function,
and the problem is consequently formulated as a multiobjective optimization problem (MOP). In the latter
decomposition, we decompose the domain into two subdomains, that is, the differentiable and
nondifferentiable domains, to isolate the nondifferentiable domain of the nondifferentiable subobjective.
Subsequently, the problem can be optimized with different schemes in the different subdomains. We
propose a population-based optimization algorithm, called the simulated water-stream algorithm (SWA),
for solving this MOP. The SWA is inspired by the natural phenomenon of water streams moving toward a
basin, which is analogous to the process of searching for the minimal solutions of an optimization problem.
The proposed SWA combines the deterministic search and heuristic search in a single framework.
Experiments show that the SWA yields promising results compared with its existing counterparts.
Title: Workload Scheduling in Data Centers with Performance Guarantee
Abstract:
Driven by the booming demands of applications, advanced computing in cloud data centers is evolving to be a major paradigm of high-performance computing for data processing and analysis. A data center is composed of a massive number of servers connected by an interconnection network, and multiple geographically dispersed data centers are connected by a dedicated center network of ultra-high bandwidth. Access to data centers from office and personal computing devices is provided through an edge network in a cloud environment that supports ubiquitous on-demand submission of client jobs in addition to data collection, local processing and outsourcing. Workload scheduling in cloud data centers is critical for improving the service capability of the data centers in terms of reducing operation cost and increasing profit. This talk addresses workload scheduling in cloud data centers for minimizing operation cost and maximizing profit in the levels of data center and server respectively. I will first overview some recent developments in high-performance computing in data centers. Then, I will discuss workload scheduling in the data center level with the focus on minimizing energy cost which is the dominating factor in a data center's operation cost, and show our recent work on workload scheduling with performance guarantee in data centers with multi-source energy supply under the given green degree constraint (carbon emission cap) for environment protection. Next, I will move down to the sever level and present our work on solving the bounded flexible scheduling problem with performance guarantee to schedule workloads with bounded deadlines and parallelism degrees on a given set of data center servers. Finally I will conclude the talk by showing some of our on-going projects and future work in this direction.
Title: Interactive Deep Metric Learning
Abstract:
The embedding-based data mining is to transform the raw data into useful information that is easy to consume by the downstream tasks, such as classification, predictive analysis, and clustering. The embedding function is traditionally dominated by various pattern mining algorithms and is recently driven by the deep learning-based embedding technique. In this talk, I will briefly introduce our recent data mining practices on the application domain of big healthcare data, specifically Interactive Deep Metric Learning.
Title:Cost Effective Data Placement in the Cloud for Efficient Data Access of Online Social Networks
Abstract:
Online social networks are organised around users who have certain expectations from their network provider, such as low latency access to both their own data and their friends’ data, often very large, e.g. videos, pictures etc. Replication of data can be used to meet these requirements and geo-distributed cloud services with virtually unlimited capabilities are suitable for large scale data storage. However, social network service providers often have a limited monetary capital to store every piece of data everywhere to minimise users’ data access latency. Therefore, it is crucial to have optimised data placement to fulfil the users’ acceptable latency requirement while having the minimum cost for social network providers. In this seminar, we address key problems including how to find the optimal number of replicas, how to optimally place the datasets and how to distribute the requests to different datacentres.
Title:Highest algorithm for linear program
Abstract:
The idea of the talk is based on Wang/s Cone cutting theory, which yields a group of special techniques. Combining the highest principle with those algorithms, we are expected to build the strong polynomial algorithms.
Title:Is NP=P? A Polynomial-time solution for finite graph isomorphism
Abstract:
This talk will introduce a polynomial-time solution for finite graph isomorphism. It targets to provide a solution for one of the seven-millennium problems: NP versus P. Three new representation methods of a graph as vertex/edge adjacency matrix and triple tuple are proposed. A duality of edge and vertex and a reflexivity between vertex adjacency matrix and edge adjacency matrix were first introduced to present the core idea. Beyond this, the mathematical approval is based on an equivalence between permutation and bijection. Because only addition and multiplication operations satisfy the commutative law, we proposed a permutation theorem to check fast whether one of two sets of arrays is a permutation of another or not. The permutation theorem was mathematically approved by Integer Factorization Theory, Pythagorean Triples Theorem and Fundamental Theorem of Arithmetic. For each of two n-ary arrays, the linear and squared sums of elements were respectively calculated to produce the results.
Title:Broad Learning: A New Perspective on Mining Big Data
Abstract:
In the era of big data, there are abundant of data available across many different data sources in various formats. “Broad Learning” is a new type of learning task, which focuses on fusing multiple large-scale information sources of diverse varieties together and carrying out synergistic data mining tasks across these fused sources in one unified analytic. Great challenges exist on “Broad Learning” for the effective fusion of relevant knowledge across different data sources, which depend upon not only the relatedness of these data sources, but also the target application problem. In this talk we examine how to fuse heterogeneous information to improve mining effectiveness over various applications, including social network, recommendation, malware detection, etc.
Title:COVID-19 – Lessons learnt from COVID-19 and the new normal as I see it
Abstract:
Indeed, pandemics are silent killers. As one author described it, these viruses are the tiniest and primitive creatures, invisible to the naked eye form of life, which have the world under his control. Humans no longer are the masters of the world. The virus has the world in his grip and we all struggle to survive.
However, plagues, major outbreaks and pandemics are of all times and probably have killed more people than all previous wars together. We often remember wars, not pandemics. Hence, we have forgotten to be prepared for pandemics; governments lack to have a plan ready to be prepared for the next epidemic. We now see that the US, India, Brazil, Russia and Argentina have topped the 1 million mark of positive cases, with many other countries following soon in their steps. And these figures are for sure an underreporting of the reality, with second waves showing we’re far from controlling the virus.
Title:How to deal with COVID-19 by using Data Analysis
Abstract:
To determine the right timing for resuming work and life, the talk first provides a retrospective analysis of COVID-19 to gain an in-depth understanding of age-specific contact-based disease transmission. This is followed then by a promising analysis of different work resumption plans to assess not only the respective economic implications of the plans, but most importantly, the associated disease transmission risks. The key to the method of COVID-19 transmission pattern characterization lies in modeling the interactions among people. Specifically, this talk considers four representative settings of social contacts that may cause the disease spread: (1) households; (2) schools; (3) workplaces; and (4) public places. It develops a computational method to measure the contact intensity between different age groups in those social settings. With such an in-depth characterization of social contact-based transmission, it is possible to analyze and explain the ins and outs of the COVID-19 outbreak, including the past and future risks, intervention effectiveness, and corresponding risks of restoring social activities.
Title:Threats and Defenses in Data Security Games
Abstract:
One of the main threats to data security is the Advanced Persistent Threat (APT) attack. An APT attacker is a stealthy threat actor which gains unauthorized access to a computer network and remains undetected for an extended period, so as to gain unauthorized data access and data corruption throughout the data lifecycle. It has five stages: reconnaissance, establish foothold, lateral movement, exfiltration, and post-exfiltration. In this talk, we discuss the use of game theory-based deception technology to defend against APT attacks. After some introduction of data security and major threats, we focus on the following two case studies: The first case study is a countermeasure against reconnaissance, where we introduce differential privacy into a deception game. By using differential privacy, the attacker cannot deduce the real configuration of each system. The second case study is a countermeasure against lateral movement, where we develop an effective repair strategy for an organization using differential game theory. Our findings help to better understand and effectively defend against APT. The talk is based on the following two recently published papers in our group:


REGISTRATION FEE
Please remark your PAPER ID (submission number in the EasyChair System) when making payment. Contact: icds.conference@gmail.com
| Type | Fee |
|---|---|
| Regular Participants | $250 USD/RMB 1500 |
| The submitted paper should be limited to 8 pages. The authors can extend a maximum of 2 pages for each paper | each extra page $100 USD /RMB 650 |
| Account Name | Big Networks PTY LTD |
|---|---|
| BSB | 063240 |
| Account Number | 10749948 |
| Bank Name | Commonwealth Bank, Australia |
| SWIFT CODE | CTBAAU2S |
| Bank Address | 1091 Mt Alexander Rd, Essendon North VIC 3041 |
| Company Address | 326/367 Burwood Road, Hawthorn, VIC 3122, Australia |