Discriminative models for multi-instance problems with tree-structure

Tallennettuna:

Bibliografiset tiedot
Julkaisussa:	arXiv.org (Mar 7, 2017), p. n/a
Päätekijä:	Pevny, Tomas
Muut tekijät:	Somol, Petr
Julkaistu:	Cornell University Library, arXiv.org
Aiheet:	Computers Labels Neural networks Communications traffic Classifiers Telemetry Firewalls Traffic models Construction costs Windows (intervals) Pattern recognition Computer simulation Training Proxy client servers
Linkit:	Citation/Abstract Full text outside of ProQuest
Tagit:	Lisää tagi Ei tageja, Lisää ensimmäinen tagi!

MARC


LEADER	00000nab a2200000uu 4500
001	2074246944
003	UK-CbPIL
022			\|a 2331-8422
035			\|a 2074246944
045	0		\|b d20170307
100	1		\|a Pevny, Tomas
245	1		\|a Discriminative models for multi-instance problems with tree-structure
260			\|b Cornell University Library, arXiv.org \|c Mar 7, 2017
513			\|a Working Paper
520	3		\|a Modeling network traffic is gaining importance in order to counter modern threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in model training phase. We propose a discriminative model that makes decisions based on all computer's traffic observed during predefined time window (5 minutes in our case). The model is trained on collected traffic samples over equally sized time window per large number of computers, where the only labels needed are human verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training the model itself recognizes discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, while the learned traffic patterns can be interpreted as Indicators of Compromise. In the following we implement the discriminative model as a neural network with special structure reflecting two stacked multi-instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) which are typically visited by infected computers.
653			\|a Computers
653			\|a Labels
653			\|a Neural networks
653			\|a Communications traffic
653			\|a Classifiers
653			\|a Telemetry
653			\|a Firewalls
653			\|a Traffic models
653			\|a Construction costs
653			\|a Windows (intervals)
653			\|a Pattern recognition
653			\|a Computer simulation
653			\|a Training
653			\|a Proxy client servers
700	1		\|a Somol, Petr
773	0		\|t arXiv.org \|g (Mar 7, 2017), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/2074246944/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/1703.02868