LỢI ÍCH CỦA ĐỌC SÁCH

KHO SÁCH

THƯ MỤC QG VIỆT NAM

video kể chuyện Bác Hồ

Ảnh ngẫu nhiên

LUY_1.jpg LUY_2.jpg LUY_3.jpg LUY_4.jpg LUY_5.jpg LUY_6.jpg Tiet_doc_lich_su.jpg Z7100585151754_7e984bd24e94a9f030378dc71e2a85e0.jpg Anhnensachfullhdtuyetdep_012539950.jpg Z5048750585930_112_5de568bc875290bb931b971d7158ae26.jpg Dia_z6122216831673_1bb9366456dbb2b709f71458b2f429fc.jpg Lanh__z6312870830186_39a8519966d8a8991be9af901c79c197.jpg Cachvequyensach1.png Pngtreeastackofbookscartoonillustrationimage_1447343.jpg Z5448522305441_b208b6841956114c65cff7bce1eb57ac.jpg Dia_z6122216975429_8031fe516af109bbdd2f05b8c6d76785.jpg Dia_z6122216688261_39c106d419b0f78d164b514c1dc437d4.jpg Z5914126780381_b9b0d5f38fbf4b10863d38a7c5c15794.jpg Z5914126675841_ce202ff431e3eef9fd6a69f712806c93.jpg

sách thiếu nhi

THỐNG KÊ BẠN ĐỌC

  • truy cập   (chi tiết)
    trong hôm nay
  • lượt xem
    trong hôm nay
  • thành viên
  • Điều tra ý kiến

    Bạn thấy trang này như thế nào?
    Đẹp
    Đơn điệu
    Bình thường
    Ý kiến khác

    Hỗ trợ trực tuyến

    Thành viên trực tuyến

    1 khách và 0 thành viên

    Các ý kiến mới nhất

    SÁCH NÓI

    Chào mừng quý vị đến với thư viện số trực tuyến

    Hướng dẫn đọc sách trực tuyến. Chọn sách mà bạn muốn đọc, chọn mũi tên để di chuyển trang. Đăng ký thành viên để tải tài liệu

    data anomaly...

    Wait
    • Begin_button
    • Prev_button
    • Play_button
    • Stop_button
    • Next_button
    • End_button
    • 0 / 0
    • Loading_status
    Nhấn vào đây để tải về
    Báo tài liệu có sai sót
    Nhắn tin cho tác giả
    (Tài liệu chưa được thẩm định)
    Nguồn:
    Người gửi: lê thị lành (trang riêng)
    Ngày gửi: 09h:32' 05-03-2025
    Dung lượng: 302.9 KB
    Số lượt tải: 0
    Số lượt thích: 0 người
    2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS) | 979-8-3503-3583-5/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICISCOIS56541.2023.10100379

    Data Anomaly Detection in Wireless Sensor
    Networks using β-Variational Autoencoder
    Arul Jothi S
    Assistant Professor,PSG College Of
    Technology Coimbatore,India
    (Affliated to Anna University,Chennai)
    saj.cse@psgtech.ac.in

    Harini S
    Student,PSG College Of Technology
    Coimbatore,India
    (Affliated to Anna University,Chennai)
    19z317@psgtech.ac.in

    Nivedha K
    Student,PSG College Of Technology
    Coimbatore,India
    (Affliated to Anna University,Chennai)
    19z336@psgtech.ac.in

    Selva Keerthana B G
    Student, PSG College Of Technology
    Coimbatore,India
    (Affliated to Anna University,Chennai)
    19z346@psgtech.ac.in

    Gokul R
    Student,PSG College Of Technology
    Coimbatore,India
    (Affliated to Anna University,Chennai)
    19z314@psgtech.ac.in

    Jayasree B S
    Student,PSG College Of Technology
    Coimbatore,India
    (Affliated to Anna University,Chennai)
    19z322@psgtech.ac.in

    Abstract— Anomaly detection is the process of identifying data
    instances that drastically deviate from the majority of data
    instances. Anomaly detection is a key challenge to ensure the
    security in Wireless Sensor Networks. The detection of such
    anomalous data is required to reduce false alarms. The data that
    is generated from wireless sensor networks have several
    imbalances. The term imbalance refers to uneven distribution of
    data into classes that severely affects the performance of
    traditional classifiers. Data imbalance is the major challenge in
    machine learning models that is resolved in the proposed model
    using deep learning technique. Deep learning technique
    proposed is beta variational autoencoders in which a parameter
    β is included to the KL divergence term of the Variational
    Autoencoder (VAE)'s loss function. The introduction of the
    parameter beta provides disentangled representation of data.
    The proposed VAE model uses multivariate normal distribution
    instead of normal distribution.
    Keywords—Anomaly Detection, Data imbalance, VAE.

    I. INTRODUCTION
    Anomaly detection is a key consideration in the development
    and deployment of machine learning and deep learning
    algorithms. Data instances or observations that differ from the
    majority of data instances are considered anomalies. Finding
    those data points or instances is the process known as anomaly
    detection. Anomaly detection(AD) is also known as outlier or
    novelty detection. Because outliers have the potential to badly
    skew the overall result of an analysis and because their
    behavior may be precisely what is desired, it is crucial to
    recognize and deal with them while evaluating data.
    AD becomes a major challenge in Wireless Sensor networks
    in order to reduce false alarms[12]. Sensors generate sensory
    data and continuously monitor physical factors including
    temperature, vibration, and motion. A sensor node may act as
    a data router and data originator simultaneously. On the other
    hand, a sink acts as the center of processing and gathers data
    from sensors. To share data, the base station of a Wireless
    Sensor Networks(WSN) system makes a connection to the
    Internet.
    Variational Autoencoder(VAE) is an autoencoder extension.
    It, like an autoencoder, consists of an encoder and decoder
    network component. A VAE learns to rebuild the original data
    by sampling from a distribution based on a mapping from an
    input. Normal data samples have a minimal reconstruction
    error, while abnormal data samples have a substantial

    c
    979-8-3503-3583-5/23/$31.00 2023
    IEEE

    reconstruction error. The reconstruction probability, as in [2]
    is employed as an anomaly score. The VAE model is trained
    by minimizing the difference between the model's estimated
    distribution and the data's true distribution. This difference is
    assessed using the Kullback-Leibler divergence, which
    measures the distance between two distributions by calculating
    how much information is lost when one distribution is used to
    represent the other.
    β-VAEs learn a disentangled representation of a data
    distribution; that is, a single unit in the latent code is only
    responsive to a single generating element. The benefit of using
    a disentangled representation is that the model is
    straightforward to generalize and interpret. For the loss
    function, beta divergence is employed instead of KL
    divergence as in [1]. The purpose for adding this hyper
    parameter is to optimize the likelihood of generating a true
    dataset while decreasing the likelihood of real to estimated
    data being minimal, under epsilon.
    II. LITERATURE SURVEY
    The authors in [4] proposed an anomaly detection
    methodology in wireless sensor networks using ensemble
    random forest. Decision tree, naïve bayes and kNN were the
    base learners of the ensemble. The test results show that better
    performance can be got using multiple learners in the
    ensemble.
    Incremental Principal Component Analysis and Support
    Vector Machine (OCSVM) was published in [6], This study
    focuses on building a lightweight anomaly detection system
    that uses one-class learning schemes and dimension reduction
    concepts to produce data gathering that is trustworthy while
    consuming less energy. Due to its strengths in categorizing
    unlabeled data, the one-class support vector machine
    (OCSVM) is utilised as an anomaly detection technique,
    whereas the hyper-ellipsoid variance may detect multivariate
    data.
    Author in [7] focuses on attack detection and proposes a model
    for intrusion detection that is compatible with WSN features.
    This approach is based on the online Passive aggressive
    classifier and information gain ratio. A dataset from a wireless
    sensor network detection system (WSN-DS) was used for the
    investigation. The suggested model ID-GOPA achieves a 96%

    631

    Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on March 05,2025 at 02:22:14 UTC from IEEE Xplore. Restrictions apply.

    detection rate when deciding if the network is operating
    normally or is vulnerable to attacks of any kind.
    A model based on Inverse Weight Clustering(IWC) and a C5.
    0 decision tree was proposed by the author in [8]. IWC and
    C5.0 decision tree algorithms is employed in this study to
    create a model that can distinguish between abnormal and
    typical activity in a wireless sensor network. IWC is used to
    classify groups and assign label to groups, and then use a C5.0
    classifier to train and test the model. The University of North
    Carolina at Greensboro (UNCG), the Intel Berkeley Research
    Lab (IBRL), and the Bharatpur Airport dataset were used in
    the experiment. The findings demonstrate that the technique
    with the highest accuracy rate for detecting anomalies on IBRL
    is IWC+C5. 0.
    [9] investigated the use of autoencoders to improve forecast
    performance for imbalanced binary classification problems.
    Authors consider breast cancer detection as an application
    domain; this is an unbalanced classification issue. The
    investigation of deep autoencoders' capacity to recognise
    patterns in classes of benign and malignant instances is one of
    the objectives of this paper. Second, they suggest and contrast
    two classification models for the detection of breast cancer
    based on autoencoders.
    On the basis of an autoencoder architecture, a neural model for
    the detection of abnormal behaviour is developed in [10] . A
    variational autoencoder will be compared to this solution to
    examine the improvements that can be made. To validate the
    proposal, the well-known dataset known as UK-DALE will be
    employed.
    The author in [11] compared three distinct pre-processing
    techniques for imbalanced categorization data. Three
    imbalanced classification data sets with various class
    imbalances are subjected to the use of Variational
    Autoencoder, Random Under-Sampling Boosting, and a
    mixture of the two. While the hybrid technique performs
    poorly for moderate class imbalanced data and best for
    extremely imbalanced data, when total classification
    performance is examined, both VAE and RUSBoost display
    better classification results.
    In [13] Online Locally Weighted Projection Regression is used
    where only a subset of data is used which is non-parametric
    and local functions that only use the subset of data make the
    present forecasts. Because of this, the processing complexity
    will be minimal. OLWPR achieves an 86 percent detection rate
    and a remarkably low 16% error rate.
    The author in [16] presented a high-dimensional, very
    unbalanced set of data is well-suited for the variance weighted
    multi-headed auto-encoder classification model . In addition
    to using weighting or sampling techniques to deal with the
    extremely unbalanced data, , the model predicts many outputs
    at the same time by combining output multi-task weighting
    and supervised representation learning.
    The paper [18] proposes a data detection approach based on
    time series which addresses the issue that the sampling values
    of sensors change significantly in hard conditions and the
    detection results of events are erroneous with the growth of
    fault nodes in WSN.

    632

    One-class principle component classifier was suggested in
    study [19]. In this work, a cluster-based distributed anomaly
    detection method for WSNs was developed. The model makes
    use of the spatial correlation of sensing data in a small area to
    increase the efficiency and efficacy of detection (i.e., cluster).
    The proposed approach aims to overcome the limitations of
    existing detection algorithms by making efficient use of the
    limited resources of sensors.
    Author in [20] put out a model that uses correlations between
    several physiological data variables and hybrid Convolutional
    Long Short-Term Memory (ConvLSTM) approaches to
    identify both straightforward point anomalies and contextual
    anomalies in the massive amount of WBAN data.
    Experimental analyses showed that the suggested model
    reported better results than that achieved by both CNN and
    LSTM independently.
    III. DATASETS
    A. IBRL(Intel Berkeley Research Laboratory) Dataset
    The dataset carries information concerning data
    gathered from fifty-four sensors utilized in the Intel Berkeley
    research laboratory between February twenty eighth and April
    5th, 2004. each thirty-one seconds, Mica2Dot sensors
    equipped with clapboards gathered time-stamped topological
    data alongside measurements of the humidity, temperature,
    light, and voltage. The data was gathered using the TinyOSbased TinyDB in-network query processing technology.

    Fig 1.Deployment of Sensors in IBRL

    Schema:

    Fig 2. IBRL Dataset Schema

    x Epoch- The data was compiled with an epoch period of
    around 30, resulting in the collection of 65,000 epochs and
    approximately 2.3 million readings.
    x Moteid- At the same moment, two readings from the same
    epoch number were produced from different motes. As
    mote Ids, sensors are numbered from 1 to 54.
    x Temperature-Temperature is collected in degrees Celsius.
    x Humidity- Humidity is measured as temperature corrected
    relative humidity on a scale of 0 to 100.
    x Light -Light is measured in Lux.
    x Voltage- Voltage is measured in volts and ranges from 2 to
    3; it has stayed relatively steady throughout their lives.
    Temperature and voltage variations are highly interrelated.

    2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS)

    Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on March 05,2025 at 02:22:14 UTC from IEEE Xplore. Restrictions apply.

    B. ISSNIP (The Intelligent Sensors, Sensor Networks &
    Information Processing) Dataset:

    "probabilistic encoder," which characterises the distribution of
    the encoded variable given the decoded one, which naturally
    defines the word "probabilistic decoder," which explains the
    distribution of the decoded variable given the encoded one.
    The Bayes theorem connects the posterior p(z|x), likelihood
    p(x|z), and prior p(z) as represented in (1).

    The Intelligent Sensors, Sensor Networks, and
    Information Processing (ISSNIP) collection contains realworld humidity-temperature sensor data acquired using
    TelosB motes in single-hop WSNs. This dataset has controlled
    anomalies, and they are all present. There are four sensor
    nodes in total: two indoor and two outdoor sensor nodes. The
    data comprises of temperature and humidity readings taken
    every 5 seconds for 6 hours. To generate anomalies, a hotwater kettle is utilized..

    ‫݌‬ሺ‫ݖ‬ሻ ൌ ߋሺͲǡ ‫ܫ‬ሻ

    (2)

    Schema:

    ‫݌‬ሺ‫ݔ‬ȁ‫ݖ‬ሻ ൌ ܰሺ݂ሺ‫ݖ‬ሻǡ ܿ‫ܫ‬ሻ݂߳‫ ܿܨ‬൐ Ͳே

    (3)

    ‫݌‬ሺ‫ݖ‬ȁ‫ݔ‬ሻ ൌ

    ௣ሺ‫ ݔ‬ȁ‫ ݖ‬ሻ௣ሺ௭ሻ
    ௣ሺ௫ሻ



    ௣ሺ‫ ݔ‬ȁ‫ ݖ‬ሻ௣ሺ௭ሻ

    (1)

    ‫ ׬‬௣ሺ‫ ݔ‬ȁ‫ ݑ‬ሻ௣ሺ௨ሻௗ௨

    Variational inference formulation:
    Fig 3. ISSNIP Dataset Schema

    x MoteID- At the same moment, two readings from the same
    epoch number were generated from various motes.
    x Humidity-Humidity is temperature corrected relative
    humidity
    x Temperature-Temperature is in degrees Celsius.
    x Label-Anomalies are labeled as 1 and the normal data is
    labeled as 0.
    IV. VARIATIONAL AUTOENCODER
    VAE causes the encoder to produce a probability distribution
    rather than a single output value in the bottleneck layer. The
    samples in the dataset can be statistically represented in latent
    space using variational autoencoders as opposed to
    autoencoders. The phrase "variational autoencoder" refers to
    an autoencoder with controlled training to avoid overfitting
    and guarantee that the latent space has desired characteristics
    that allow regenerative processes.
    The use of normal encoding distributions enables the encoder
    to be trained to return the Gaussian mean and covariance
    matrix. The encoder's generated distributions must also be
    close to a typical normal distribution. The proposed model
    uses multivariate normal distribution since the anomalies are
    detected in multivariate time series datasets.
    The "reconstruction term" (on the final layer) aims to make
    the encoding-decoding strategy as performant as possible,
    whereas the "regularization term" (on the latent layer) tends to
    normalize the organization of the latent space by bringing the
    encoder's distributions close to a standard normal distribution.
    As a result, when training a VAE, the loss function that is
    reduced is made up of these two terms. That regularization
    term is given as the Kulback-Leibler divergence between the
    returned distribution and a typical Gaussian.
    Mathematics behind VAE:
    For each data point, the following two steps generative process
    is assumed:
    1. From the prior distribution p, a latent representation z
    is sampled (z).
    2. x is drawn from the conditional likelihood
    distribution p(x|z).
    In contrast to p(x|z), p(z|x) defines the phrase

    Variational inference (VI) is a statistical method for
    approximating complex distributions. Here, q_x(z) is a
    Gaussian distribution that approximates p(z|x), and g and h are
    two functions of the parameter x that define its mean and
    covariance as expressed in (4).
    ‫ݍ‬௫ ሺ‫ݖ‬ሻ ‫ߋ ؠ‬ሺ݃ሺ‫ݔ‬ሻǡ ݄ሺ‫ݔ‬ሻሻ݃߳‫ܪ݄߳ܩ‬

    (4)

    The optimum approximation is discovered by minimizing the
    Kullback-Leibler divergence between the approximation and
    the target p(z|x) by optimizing the functions g and h (really,
    their parameters).
    (g*,h*)

    = ƒ”‰  ‹  ‫ܮܭ‬ԛ൫‫ݍ‬௫ ሺ‫ݖ‬ሻǡ ‫݌‬ሺ‫ݖ‬ȁ‫ݔ‬ሻ൯
    ሺ௚ǡ௛ሻఢԛீൈு

    = ƒ”‰  ‹ ቀ‫ܧ‬௭ି௤ೣ െ
    ሺ௚ǡ௛ሻఢீൈு

    ԛหȁ௫ି௙ሺ௫ሻ ȁหమ
    ଶ௖

    ቁ ԛ െ ‫ܮܭ‬൫‫ݍ‬௫ ሺ‫ݖ‬ሻǡ ‫݌‬ሺ‫ݖ‬ሻ൯

    (5)

    The function f is selected so that, when z is sampled from q*
    _x, the predicted log-likelihood of x given z is maximized (z)
    as shown in (6).
    ݂ ‫ כ‬ൌ ܽ‫ܧ šƒ ݃ݎ‬௭̱௤ೣ‫ כ‬ሺŽ‘‰ ‫݌‬ሺ‫ݔ‬ȁ‫ݖ‬ሻሻ
    ௙ఢி

    ݂ ‫ כ‬ൌ ܽ‫ܧ šƒ ݃ݎ‬௭̱௤ೣ‫ כ‬ቀെ
    ௙ఢி

    ԛหȁ௫ି௙ሺ௫ሻ ȁหమ
    ଶ௖



    (6)

    Gathering all the pieces together, we are looking for optimal
    f*, g* and h* as shown in (7) such that
    ሺ݂ ‫ כ‬ǡ ݃‫ כ‬ǡ ݄‫ כ‬ሻ ൌ ƒ”‰  ƒš ቆ‫ܧ‬௭̱௤ೣ ቀെ
    ሺ௙ǡ௚ǡ௛ሻఢிൈீൈு

    ԛหȁ௫ି௙ሺ௫ሻ ȁหమ
    ଶ௖

    ቁെ

    ‫ܮܭ‬൫‫ݍ‬௫ ሺ‫ݖ‬ሻǡ ‫݌‬ሺ‫ݖ‬ሻ൯ቇ

    (7)

    The gradient descent is facilitated amidst the random
    sampling that occurs halfway through the architecture by a
    simple trick known as the reparameterization trick, which
    takes advantage of the fact that if z is a random variable with
    a Gaussian distribution with mean g(x) and covariance H(x)=h,
    then (x).
    β-Variational Autoencoder:
    β-VAEs learn a disentangled representation of a data
    distribution; that is, a single unit in the latent code is only
    responsive to a single generating element. If each variable is

    2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS)

    633

    Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on March 05,2025 at 02:22:14 UTC from IEEE Xplore. Restrictions apply.

    sensitive to only one feature/property of the dataset while
    being largely invariant to another, the dataset is said to be
    disentangled. The benefit of using a disentangled
    representation is that the model is straightforward to generalize
    and interpret[14]. ߚ-VAE use beta percentage of KL
    divergence for the loss function [5].
    ‫ ݏݏ݋ܮ‬ൌ ‫ܮ‬ሺ‫ݔ‬ǡ ‫ݔ‬ොሻ ൅ ߚ σ௝ ‫ܮܭ‬ሺ‫ݍ‬௝ ሺ‫ݖ‬ȁ‫ݔ‬ሻȁȁ‫݌‬ሺ‫ݖ‬ሻ

    (8)

    V. PROPOSED SOLUTION

    STEPS:
    1.Data Preprocessing:
    A pandas dataframe is initially loaded with the data. Date,
    time, moteid, reading number, and label columns are deleted
    from the dataframe. The duplicate rows are then deleted. The
    last step is to eliminate the rows with NaN values.
    Training and testing sets of the data have been separated. A
    certain percentage of outliers are introduced into the testing
    dataset and scaled. The training and testing numpy arrays are
    transformed into tensors using Data Builder. To iterate through
    the data and manage batches, a data loader is employed.
    2.VAE model specification:
    TABLE I.

    ENCODER AND DECODER NETWORK

    Encoder

    Decoder

    Hidden Layer

    Number of
    Hidden Layer
    Number of
    neurons
    neurons
    1
    50
    1
    12
    2
    12
    2
    50
    Latent space dimension = 2
    Activation Function = Relu
    Loss function = Reconstruction error+0.2*KL divergence
    Distribution = Multivariate Normal

    3.Training:

    Fig 4. Steps in the model

    The training dataset consists of 70% of the dataset and does
    not involve anomalies in it[3]. The train function is defined
    where for each batch in the training set, encoding,
    reparameterization and decoding is performed. The latent
    space distribution is chosen to be multi variate normal
    distribution. The loss is calculated as in (8) and the average
    loss of the batch is appended to a list. The loss is
    backpropagated during each epoch. The threshold for anomaly
    classification is set as the maximum of the average losses of
    each batch and the threshold is returned. The threshold at the
    end of all epochs is passed to the test phase.

    A. Algorithm

    4.Testing:

    1.Preprocess the data.
    2.Split the data into training and testing set.
    3.Define the Variational Auto encoder class
    4.Training
    For each batch
    Run the model to get the reconstructed values, mean,
    variance
    Calculate the loss for the batch and update the
    parameters(weight and bias)
    Set threshold to the max (average(batchsize))
    5.Repeat Step 4 for 50 epochs
    6.Testing(threshold)
    Run the model to get the reconstructed values, mean,
    variance
    Calculate the loss for each sample
    If the loss > threshold,
    increment the outlier count.

    For the purpose of testing, testing dataset (30% of original
    dataset) with different abnormality ranges were used. The test
    function is defined where for each batch in the testing set,
    encoding, reparameterization and decoding is performed. The
    threshold for anomaly classification got from training is
    passed as a parameter for testing. The loss is calculated for
    each sample and the ones that are greater than the threshold is
    classified as anomalous data.
    VI. RESULT ANALYSIS
    Train/Test Loss:
    β- Variational Autoencoder's loss function is
    composed of two terms:
    Reconstruction Loss:
    The difference between input representation and
    output representation is known as reconstruction error or
    reconstruction loss (error between input vector and output

    634

    2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS)

    Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on March 05,2025 at 02:22:14 UTC from IEEE Xplore. Restrictions apply.

    vector). With regards to the β-variational autoencoder model,
    the reconstruction error is the mean squared loss.
    Regularization Term:
    That regularization term is expressed as the Kulback-Leibler
    divergence between the returned distribution and a standard
    Gaussian. Minimizing the KL divergence here means
    optimizing the probability distribution parameters (μ and σ) to
    closely resemble that of the target distribution. In β-VAE, only
    βamount of KL divergence is considered in the train/test loss
    which is expressed in (8).
    Since the β-variational autoencoder is trained on normal data,
    the loss of the β- variational autoencoder is larger when it
    attempts to reconstruct anomalous data than when it attempts
    to reconstruct normal data. This maximum of the loss function
    in training phase is taken as a threshold for classification in the
    testing phase.

    Fig 6. Train Loss for IBRL

    The Fig 8 and Fig 9 gives false positive analysis for ISSNIP
    and IBRL datasets respectively.
    False Positive Analysis

    Number
    of
    outliers
    introduced
    into the testing
    dataset
    150
    300
    450
    600

    FALSE POSITIVE RATE FOR ISSNIP

    Percentage of
    anomalies
    introduced in
    the
    testing
    dataset
    4.41%
    8.82%
    13.24%
    17.65%

    Percentage of
    anomalies
    predicted by
    the model

    False Positive
    Rate

    4.17%
    8.85%
    13.3%
    15.89%

    0.24
    0.03
    0.06
    1.76

    Percentage

    TABLE II.

    20

    10
    0
    150

    300

    Actual anomalies

    450

    600

    Predicted anomalies

    Fig 7. False Positive Analysis for ISSNIP

    Number
    of
    outliers
    introduced
    into the testing
    dataset
    25,000
    50,000
    75,000

    FALSE POSITIVE RATE FOR IBRL

    Percentage of
    anomalies
    introduced in
    the
    testing
    dataset
    5.28
    10.03
    15.04

    Percentage of
    anomalies
    predicted by
    the model

    False Positive
    Rate

    6.05
    11.84
    15.63

    0.77
    1.81
    0.59

    False Positive Analysis
    Percentage

    TABLE III.

    20
    10
    0
    25,000

    Actual anomalies

    Table II and table III shows the percentage of anomalies
    introduced into the testing dataset and the percentage of data
    points predicted as anomalies by the model and the difference
    between them for ISSNIP and IBRL datasets respectively.
    The Training Loss for ISSNIP and IBRL dataset decreases as
    the number of epochs increases.

    50,000

    75,000

    Predicted anomalies

    Fig 8. False Positive Analysis for IBRL

    The Figure 10 and Figure 11 shows the difference between
    actual and predicted anomalies for ISSNIP and IBRL datasets
    respectively

    False Positive rate

    False Positive Rate
    8
    6
    4
    2
    0
    0

    200

    400

    600

    800

    Number of anomalies
    Fig 9. False Positive Rate for ISSNIP

    Fig 5. Train Loss for ISSNIP

    2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS)

    635

    Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on March 05,2025 at 02:22:14 UTC from IEEE Xplore. Restrictions apply.

    False Positive Rate

    [4]

    False Positive Rate

    [5]

    8
    6
    4
    2
    0

    [6]

    0

    20,000 40,000 60,000 80,000

    [7]

    Number of anomalies
    [8]
    Fig 10. False Positive Rate for IBRL
    [9]

    The results show that ઺-variational autoencoders that uses
    multivariate normal distribution was able to detect anomalies
    in imbalanced dataset[15][17].
    CONCLUSION
    We presented an anomaly detection approach in wireless
    sensor networks using ઺-variational autoencoders. Normal
    sensor values from IBRL and ISSNIP datasets are given as
    input to the model for training where we set the threshold to be
    the maximum of the average train losses. Testing is done with
    by introducing outliers into the dataset and classifying the test
    sample as anomaly if it's loss as in (8) falls above the threshold
    value. Ї ’”‘’‘•‡† ઺-variational autoencoders uses only
    20% of the KL divergence while calculating the loss function
    and makes use of multivariate normal distribution, In the
    future, if the threshold that we calculate from the training loss
    is not that efficient, then we can check whether the threshold
    can be fixed by taking the mean or standard deviation. Further,
    the goal is to design a mechanism to see whether we can
    replace the outlier with statistical measures.
    REFERENCES
    [1]

    [2]

    [3]

    636

    Haleh Akrami, Anand A. Joshi, Jian Li, Sergül Aydöre, Richard M.
    Leahy,(2022), “A robust variational autoencoder using beta divergence”,
    Volume 238, DOI: 10.1016/j.knosys.2021.107886.
    Touseef Iqbal, Shaima Qureshi(2022), "Reconstruction probabilitybased anomaly detection using variational auto-encoders", DOI:
    10.1080/1206212X.2022.2143026.
    Walaa Gouda , Sidra Tahir, Saad Alanazi, Maram Almufareh, Ghadah
    Alwakid(2022), "Unsupervised Outlier Detection in IOT Using Deep
    VAE", DOI: 10.3390/s22176617.

    [10]

    [11]

    [12]

    [13]

    [14]

    [15]

    [16]
    [17]
    [18]
    [19]

    [20]

    Priyajit Biswas & Tuhina Samanta (2021),"Anomaly detection using
    ensemble random forest in wireless sensor network".
    Miroslav Fil1, Munib Mesinovic1, Matthew Morris1, Jonas
    Wildberger(2021), “β-VAE REPRODUCIBILITY: CHALLENGES
    AND EXTENSIONS”.
    Nurfazrina M. Zamry, Anazida Zainal, Murad A. Rassam, Eman H.
    Alkhammash, Fuad A. Ghaleb, and Faisal Saeed (2021) “Lightweight
    Anomaly Detection Scheme Using Incremental Principal Component
    Analysis and Support Vector Machine”, DOI: 10.3390/s21238017.
    Samir Ifzarne, Hiba Tabbaa, Imad Hafidi, Nidal Lamghari(2021),
    "Anomaly Detection using Machine Learning Techniques in Wireless
    Sensor Networks",DOI 10.1088/1742-6596/1743/1/012021.
    Pramod Kumar Chaudhary, Arun Kumar Timalsina (2021) "Anomaly
    Detection in Wireless Sensor Network using Inverse Weight Clustering
    and C5.0 Decision Tree", Volume 7.
    Vlad-IoanTomescu,GabrielaCzibula,ŞtefanNiţică(2021) ,"A study on
    using deep autoencoders for imbalanced binary classification",Volume
    192.
    Daniel Gonzalez, Miguel A. Patricio, Antonio Berlanga, Jose M.
    Molina(2020), "Variational autoencoders for anomaly detection in the
    behaviour of the elderly using electricity consumption data", DOI:
    10.1111/exsy.12744
    Jesper Ludvigsen,Patrik Andersson(2020),"Handling Imbalanced Data
    Classification With Variational Autoencoding And Random UnderSampling Boosting".
    Ahmed Muqdad Alnasrallah, Zahraa Radhi Waad, Atyaf Jarullah
    yaseen(2020),"An Improved Unsupervised Anomaly Detection for
    Wireless Sensor Network using Machine Learning",Volume:63 No. 6.
    I. Gethzi Ahila Poornima, B. Paramasivan(2020), “Anomaly detection
    in wireless sensor network using machine learning algorithm”, DOI:
    10.1016/j.comcom.2020.01.005, Volume 151.
    Adrian Alan Pol, Victor Berger, Gianluca Cerminara, Cecile Germain,
    Maurizio Pierini(2020), "Anomaly Detection With Conditional
    Variational Autoencoders",DOI: 10.48550/arXiv.2010.05531.
    Harshita Patel, Dharmendra Singh Rajput, G Thippa Reddy ,Celestine
    Iwendi , Ali Kashif Bashir, Ohyun Jo(2020),"A review on classification
    of imbalanced data for wireless sensor networks",DOI:
    10.1177/1550147720916404,Volume 16(4).
    Chao Zhang,Sthitie Bom(2021),"Auto-encoder based Model for High
    dimensional Imbalanced Industrial Data”.
    Justin M. Johnson, Taghi M. Khoshgoftaar(2019), "Survey on deep
    learning with class imbalance", DOI: 10.1186/s40537-019-0192-5.
    Li, Yan(2019) “Anomaly Detection in Wireless Sensor Networks Based
    on Time Factor”, DOI: 10.3233/JIFS-179298.
    Murad A. Rassam, Mohd Aizaini Maarof and Anazida Zainal (2018) “A
    distributed anomaly detection model for wireless sensor networks based
    on the one-class princpal component classifier”, International Journal of
    Sensor Networks 27(3):200, DOI:10.1504/IJSNET.2018.093126.
    Albatul Albattah,Murad A. Rassam(2022), "A Correlation-Based
    Anomaly Detection Model for Wireless Body Area Networks Using
    Convolutional
    Long
    Short-Term
    Memory
    Neural
    Network",DOI:10.3390/s22051951.

    2023 International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS)

    Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on March 05,2025 at 02:22:14 UTC from IEEE Xplore. Restrictions apply.
     
    Gửi ý kiến

    SÁCH THAM KHẢO  (79 bài)