# Purpose of this website

This website presents research results, interests and digital supplements to existing works on the research areas of Alexander Motzek, PhD candidate at the University of Lübeck, Institute of Information Systems. Research results and interests include the areas of probabilistic graphical models, artificial intelligence, cyber security, mission impact assessments and computer vision

This website further features details and supplements on my PhD thesis entitled ``Indirect Causes, Dependencies and Causality in Bayesian Networks''.

For my personal website, please go to www.motzek.org.

# Indirect Causes, Dependencies and Causality in Bayesian Networks

Modeling causal dependencies in complex or time-dependent domains often demands cyclic dependencies. Such cycles arise from local points of views on dependencies where no singular causality is identifiable, i.e., roles of causes and effects are not *universally* identifiable. Modeling causation instead of correlation is of utmost importance, which is why Bayesian networks are frequently used to reason under uncertainty. Bayesian networks are probabilistic graphical models and allow for a causal modeling approach with locally specifiable and interpretable parameters, but are not defined for cyclic graph structures. If Bayesian networks are to be used for modeling uncertainties, cycles are eliminated with dynamic Bayesian networks, eliminating cycles over time. However, we show that eliminating cycles over time eliminates an anticipation of indirect influences as well, and enforces an infinitesimal resolution of time. Without a ``causal design,'' i.e., without representing direct and indirect causes appropriately, such networks return spurious results.

In particular, the main novel contributions of this thesis can be summarized as follows. By considering specific properties of local conditional probability distributions, we show that a novel form of probabilistic graphical models rapidly adapts itself to a specific context at every timestep and, by that, correctly anticipates indirect influences under an unrestricted time granularity, even if cyclic dependencies arise.

# PhD Thesis: Digital Editions

If you prefer reading on the screen, want to search something or view all figures on more detail, you can download the following PDFs, which represent my thesis at the state of final publishment.

In the printerfriendly version, figures containing large amounts of transparent layers have been flattened and rendered at 600dpi. Moreover, a slightly brighter portrait has been used in the curicuum vitae.

I defended my thesis successfully on the 19th of October, 2016. Here are my slides.

# Publications

To provide ease of access to my publications that were created as part of my thesis, I provide all PDFs of published and accepted papers here.

- Indirect Causes in Dynamic Bayesian Networks Revisited
In: IJCAI 2015: 24th International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, pp. 703-709,
*main author*. Slides Poster
- Exploiting Innocuousness in Bayesian Networks
In: AI 2015: 28th Australasian Joint Conference on Artificial Intelligence, Canberra, ACT, Australia, pp. 411-423, doi: 10.1007/978-3-319-26350-2_36,
*main author*. Slides
- Context- and Bias-Free Probabilistic Mission Impact Assessment. Computers & Security, ISSN 0167-4048, Volume 65, pp. 166-186, submission date: April 1, 2016, revision date: August 3rd, 2016, revised: October 10th, 2016, accepted: November 11th, 2016, doi: 10.1016/j.cose.2016.11.005,
*in-press*. *main author*.
- Probabilistic Mission Impact Assessment based on Widespread Local Events
In: NATO IST-128 Workshop: Assessing Mission Impact of Cyberattacks, NATO IST-128, Istanbul, Turkey, pp. 16-22,
*main author*. Detailed Slides Simple Slides
- Semantic Normalization and Merging of Business Dependency Models
In: CBI 2016: 18th IEEE Conference on Business Informatics, Paris, France, pp. 7-15, doi: 10.1109/CBI.2016.10,
*main author*. Slides
- Selection of Mitigation Actions Based on Financial and Operational Impact Assessments
In: ARES 2016: 11th International Conference on Availability, Reliability and Security, Salzburg, Austria, pp. 137-146, doi: 10.1109/ARES.2016.3,
*second main author*.
- Probabilistic Mission Defense and Assurance. In: NATO IST-148: Symposium on Cyber Defence Situation Awareness, STO-MP-IST-148, Bulgaria, Sofia, pp. 4-1 -- 4-18, doi: 10.14339/STO-MP-IST-148,
*main author*. Slides
- Formalizing Agents' Beliefs for Cyber-Security Defense Strategy Planning
In: CISIS 2015: 8th International Conference on Computational Intelligence in Security for Information Systems, Burgos, Spain, p. 15-25, doi: 10.1007/978-3-319-19713-5_2,
*second main author*. Slides
- Towards an Automated and Dynamic Risk Management Response System. In NORDSEC 2016: 21st Nordic Conference on Secure IT Systems, Oulu, Finland, November 2-4, 2016, pp. 37-53, doi: 10.1007/978-3-319-47560-8_3,
*coauthor*.

The following journal articles and papers have been submitted and are currently under review. As these are yet unpublished I provide these PDFs here for reference. Note that these articles are not referenced in the bibliography of my thesis.

- Learning to Anticipate Indirect Causes in Dynamic Bayesian Networks. Submitted to the Journal of Bayesian Analysis, submission date: 19.08.2016, submission number: BA1608-008.
*main author*. Digital Supplement
- Indirect Causes in Dynamic Bayesian Networks Revisited. Submitted to JAIR, Journal of Artificial Intelligence Research, ISSN 1076-9757, submission date: March 7, 2016, feedback received: August 3rd, 2016, revised: October 18th, 2016.
*main author*.

# Applications for Exact and Approximate Inference

To validate the theoretical complexity analysis of exact and approximate inference, derived in Chapter 2, all algorithms have been implemented in plain C to perform inference in the taintedness domain. The application generates random instances of the taintedness domain for N employees, and measures the computation time for solving filtering and smoothing problems using exact inference and compares them to the inference accuracy and computation time of approximate inference. As the taintedness domain is represented by a dense intra-timeslice ADBN, this application can be used for general inference using dense intra-timeslice ADBNs.

In the following, I provide all applications as source code and compiled binaries to validate my obtained results. Further, I provide all generated data from which the figures were generated. For every binary I provide a sample output(s) of the application for adhoc reference.

As exact inference is computationally expensive and has large memory requirements for storing results, i.e., joint probability distributions, inference is implemented in plain C, without object-orientation, and is based on various bit-maps and bit-flags involving various bit-shifting and bit-masking operation. In order to understand the implementation it is crucial to understand the employed bit-encoding:

In order to encode observations, joint probability distribution, and conditional probability distributions, an encoding of an instantiation of all random variables in a timeslice is required. An instantiation of a timeslice is encoded as an unsigned long for N<6, and as an unsigned long long for N>=6. On the employed windows machine, these are 32Bit for N<6 and 64Bit for N>=6. Every bit represent one boolean random variable in the following ordering:

instantiation = [XXXXXX][A23A13A03][A32A12A02][A31A21A01][A30A20A10]

[allX][AsOfX3][AsOfX2][AsOfX1][AsOfX0]

Likewise, every conditional probability distribution P(X|\vec{Z}) is encoded by an enumerate random variable identifier for identifying X, and a bitvector representing \vec{Z}, where \vec{Z} is encoded similiarly as

dep = [otherX][activatorsOfX][Xpreviously]

Similarly, observations of a timeslice are encoded as bit masking vectors. A "true"-observation of the i-th random variable (according to the enumeration of an instantiation) is encoded in a bitmasking vector "evidenceTrue" by a logical 1 on the i-th bit of evidenceTrue. Likewise, a "false"-observation of the i-th random variable is encoded in a bitmasking vector "evidenceFalse" by a logical 0 on the i-th bit. Therefore, every instantiation *inst* is conforms with the made observations *evidenceTrue* and *evidenceFalse*, if

(inst == inst|evidenceTrue) && (inst == inst&evidenceFalse) .

Note: Exact inference in dense intra-timeslice ADBNs is intractable for N>5 and approximate inference techniques are required. To perform approximate inference in dense ADBNs with more than 64 random variables, the type definition used to represent an instantiation must be increased to contain and appropriate amount of bits.

Different compilation flags and parameters are used to generate evaluations shown in Figures 2.3-2.8. The following sections describe these situations in order to recreate and validate the obtained results. I provide compiled binaries for x64 windows machines, compiled using Visual Studio 2015 and the complete Visual Studio Solution:

# Figures 2.3-2.6 (Exact Inference, N=4)

//#define ONLYPARTICLE 1

#define N 4

Using argument 0, this application evaluates exact inference as in Figures 2.3-2.5. It creates 50 experiments, where online filtering and offline smoothing problems are solved over 40 timeslices in the taintedness domain with four employees (16 random variables). Each experiment creates one tab-separated file containing various measurements of the inference process.

Using argument 1, this application evaluates approximate inference against exact inference in terms of accuracy and computation time. Every execution creates 10 experiments, evaluates over 250 timeslices. It evaluates SIS and SIR and scales from 1000 to 1 000 000 samples. Four employees evaluated in the taintedness domain. Each experiment creates a tab-separated file for each number of samples, for each approximation techniques. Each file contains various measurements of the exact and approximate inference process.

# Figure 2.8 (Exact Inference, N=5)

//#define ONLYPARTICLE 1

#define N 5

Only argument 1 (approximate inference) is supported. It evaluates approximate inference using SIR with 1 000 000 and 10 000 000 samples against exact inference (filtering). Performs 10 experiments. Warning: Every experiments takes about 2 hours. Creates the same file structure as above.

# Figure 2.7 (Approximate Inference, 9 to 49 random variables)

#define ONLYPARTICLE 1

//define desired number of employees

#define N 3 //or 4,5,6,7

Only argument 1 (approximate inference) is supported. This evaluation is targeted to test the computational complexity in terms of number of random variables using SIR. The accuracy is not compared to exact inference, as exact inference would take about 6 days for 36 random variables. Every execution generates 10 experiments. In every experiment, 50 online filtering problems (50 timeslices) are solved. Creates the same file structure as above, but measurements on exact inference are blank.

# Applications for Learning ADBNs

To validate derived algorithms for learning ADBNs in Chapter 3, the derived learning algorithm has been implemented in plain C. The application creates instances of the taintedness domain with N employees (as in the previous application) and generates datasets over long periods of time. Consecutively, instantiations from these datasets are removed to create incomplete datasets with hidden variables. Following, various forms of ADBNs are learned from these datasets. Every learned model is compared to the original models based on the difference of parameters and their ability to achieve the same inference results.

The implementation follows the same bitvector and bitmaskings encodings as previously. Various different compilation settings are required to evaluate different situations of incomplete data and various ADBN models. The following sections discuss these settings to obtain results as shown in Figure 3.1-3.3. Again, I provide compiled x64 windows machine binaries obtained from the following Visual Studio Solution:

# Figure 3.1 (Learning from complete datasets)

//General settings

#ifdef EXPERIMENT_FULLY_OBSERVED

#define N 4

#define defMaxLearnRounds 1

#define OBSERVE_TRUE_ACTIVATOR_FOR_LEARNING 1

//#define ACTIVATORMAP 0.5

#define tMaxLearnData 50000

#endif

In this setting, 50 000 datapoints (timeslices) are generated for learning. The complete dataset is used for learning different forms of ADBNs, discussed in the following settings. Four employees are simulated in the taintedness domain. Every learned model is evaluated against the original model over 200 timeslices. One needs to expect that a learned model comes to the same conclusions, i.e., similar filtering and smoothing distributions, as the original model. This difference/similarity is measured by the Hellinger distance, where a Hellinger distance of 0 exists between two probability distributions that account exactly the same probabilities to all instantiations. The maximum Hellinger distance of 1 exists between two distributions where one distributions neglects the probability of all instantiation which are said to be possible by the other distribution.

Each experiment creates four tab-separated files: The convergence of learned parameters of each model towards the original parameters (two files), evaluation of filtering results over multiple timeslices for each model (combined in one file), and evaluation of smoothing results multiple timeslices for each model (combined in one file). Note that the convergence results only contain one line of measurements (multiple learn-rounds in complete datasets are not required, as no missing values must be restored).

#define ENFORCE_ACTIVATOR_FOR_DIAGONAL 1

//#define USE_TREE_STRUCTURED_DIAGONAL 1

#define EXPERIMENT_FULLY_OBSERVED 1

Learns a cyclic intra-timeslice ADBN and a diagonal inter-timeslice ADBN from complete datasets.

#define ENFORCE_ACTIVATOR_FOR_DIAGONAL 1

#define USE_TREE_STRUCTURED_DIAGONAL 1

#define EXPERIMENT_FULLY_OBSERVED 1

Learns a cyclic intra-timeslice ADBN and a tree-structured ADBN from complete datasets. The tree-structured ADBN is implemented by a modification to the method that handles conditional probability distributions for every random variable.

//#define ENFORCE_ACTIVATOR_FOR_DIAGONAL 1

//#define USE_TREE_STRUCTURED_DIAGONAL 1

#define EXPERIMENT_FULLY_OBSERVED 1

Learns a cyclic intra-timeslice ADBN and a diagonal inter-timeslice ADBN from complete datasets, but the diagonal model is not bound to activator constraints. As discussed in thesis, no significant differences exist between an activator-bound and activator-unbound diagonal model, as indirect influences are completely ignored.

#define ENFORCE_ACTIVATOR_FOR_DIAGONAL 1

//#define USE_TREE_STRUCTURED_DIAGONAL 1

#define EXPERIMENT_FULLY_OBSERVED 1

#define defMaxLearnRounds 0

For reference, results of completely random CPDs are generated by this setting. CPDs are randomly instantiated at first and are consecutively replaced by the learned CPD parameters. Here, no learning is performed and CPDs are left in their initial random instantiation.

# Figures 3.2 & 3.3 (Learning from incomplete datasets)

//GENERAL SETTINGS!

#define ENFORCE_ACTIVATOR_FOR_DIAGONAL 1

//#define USE_TREE_STRUCTURED_DIAGONAL 1

Figures 3.2 and 3.3 show evaluations of the learning procedure from incomplete datasets. To generate these evaluations, the abovementioned global flags are required. In every experiment, each generated dataset is manipulated and certain instantiations are excluded. The following settings evaluate different forms of exclusions, as discussed in the thesis. For these experiments, 10 000 datapoints are used, obtained from multiple instances of the taintedness domain with three employees.

When dealing with incomplete datasets, multiple learn-rounds are required, where learned parameters converge to the original parameters. As previously, every learned model is evaluated against the original model over 200 timeslices. One needs to expect that a learned model comes to the same conclusions, i.e., similar filtering and smoothing distributions, as the original model. This difference/similarity is measured by the Hellinger distance, where a Hellinger distance of 0 exists between two probability distributions that account exactly the same probabilities to all instantiations. The maximum Hellinger distance of 1 exists between two distributions where one distributions neglects the probability of all instantiation which are said to be possible by the other distribution.
For Figure 3.2, the plots of convergences with fewer learnrounds than 100, have been continued by their latest value, i.e., might have converged lower, but formed a significant plateau.
For Figure 3.3, EXPERIMENT_HIDDEN_XA is performed on 40 learn rounds to achieve a similar amount of experiments in the same computation time compared to the other experiments.

#define EXPERIMENT_FULLY_OBSERVED 1

#ifdef EXPERIMENT_FULLY_OBSERVED

#define N 3

#define defMaxLearnRounds 1

#define OBSERVE_TRUE_ACTIVATOR_FOR_LEARNING 1

//#define ACTIVATORMAP 0.5

#define tMaxLearnData 10000

#endif

For a lower-bound reference, this setting evaluates learning from a complete dataset in this setting. Note that only one learning round is required. Each execution performs 50 experiments. Note that N and tMaxLearnData changed from the previous experiments on complete data!

#define EXPERIMENT_FULLY_OBSERVED 1

#define DO_NOT_LEARN 1

For an upper-bound reference, this setting generates results from completely random CPDs reference plot. This means that the difference of multiple randomly generated CPDs towards the original CPDs is measured, and inference results based on these random CPDs are compared to the original models. Each execution performs 50 experiments.

#define EXPERIMENT_HIDDEN_X 1

In this setting, state variables (X) are randomly excluded from the learning set. 25 learn-rounds are performed in each experiments. Each execution performs 25 experiments.

#define EXPERIMENT_HIDDEN_A 1

#define defMaxLearnRounds 25

In this setting, activator random variables are excluded randomly from the learning set. This means that the structure is partially unknown and must be restored. Each execution generates 10 experiments and uses 25 learn-rounds.

#define EXPERIMENT_HIDDEN_A 1

#define defMaxLearnRounds 50

Same as above, but with 50 learn-rounds. No significant difference is evident.

#define EXPERIMENT_HIDDEN_XA 1

#define defMaxLearnRounds 100

This setting is a combination of the three above. State variables (X) and activator random variables are randomly executed from each instantiation, and only regularity assuring instantiations remain. Note that tMaxLearnData was not increased, i.e., still 10000 timeslices are used for learning, from which instantiations are excluded. Effectively, significantly less individual datapoints remain for learning; hence, the lower accuracy evident from Figure 3.3. In this setting, 100 learn-rounds are used, whose results are used in Figure 3.2. Eacg execution generates 10 experiments.

#define EXPERIMENT_HIDDEN_XA 1

#define defMaxLearnRounds 40

Same as above, but for 40 learning rounds. Used for inference accuracy in Figure 3.3, to achieve similar computation time required for performing multiple experiments as in the other experiments.