· 6 years ago · Oct 07, 2019, 03:38 AM
1CS858 F19 Paper #15 Reviews and Comments
2===========================================================================
3Paper #15 Neural Cleanse: Identifying and Mitigating Backdoor Attacks in
4Neural Networks
5
6
7Review #15A
8===========================================================================
9* Reviewer: Laura Graves <laura.graves@uwaterloo.ca>
10* Updated: 4 Oct 2019 2:40:22pm EDT
11
12Paper summary
13-------------
14The authors present methods for defending against backdoor attacks - both detection and two forms of model patching.
15
16The authors show that they can detect the presence of a backdoor in a trained model. Further, the authors introduce a method for reverse-engineering backdoor attack triggers from these models. Next, they show how these reverse-engineered triggers can be used to patch models that are robust against these backdoors, as well as showing detection methods for detecting examples containing the attack trigger.
17
18Problems in Understanding
19-------------------------
20In figure 10, removing 5-10% of neurons actually slightly increases the classification accuracy (from ~73%->~78%, by my visual measure). I couldn't find where the authors talked about this, if at all, but I don't understand why this would be the case. Is it because those neurons that are most vulnerable to the trigger are more corrupted by the training, and negatively affect the model accuracy? Some insight would be nice!
21
22Strengths
23---------
24The authors present 3 mitigation methods and test them against a variety of state-of-the-art attacks, which is wonderful.
25
26The intuition in figure 2 may be an oversimplification, but it gave me an excellent intuition toward why they believe these backdoor trained models are so susceptible to the trigger.
27
28The anomaly measurement they use is very significant! In the examples they show in figure 3 it's very clear that this metric is very useful for detection, it was a simple and very good choice.
29
30The concept behind the model unlearning process is intuitive and effective (most of the time, at least). This was the one more than any where it felt like "wow, I could have thought of that" (but, of course, didn't), which is often a sign of a very clever technique.
31
32Weaknesses
33----------
34I really liked this paper but I had a number of small complaints:
35
36The foundation for the intuitive explanation for these pocket backdoor spaces is that the decision boundaries are smooth and well-generalized. Additionally, the assumption is that each classification region is one generally homogeneous region. Often the intuition behind the success of adversarial examples is specifically because these things are not true - what's the disconnect here?
37
38I don't like that the only metric they use for the magnitude of the trigger is $\ell_1$ distance - why isn't the delta value considered? We've seen adversarial attacks work on one pixel before and be quite successful (https://arxiv.org/pdf/1710.08864.pdf), so why are we using a metric that so poorly estimates how much the image has to be perturbed? I find it very hard to believe that this is a good metric in any kind of general case, even if it did work well here.
39
40The authors claim that models can be attacked without any significant performance drop ("Such a backdoor does not affect the model's normal behavior on clean inputs without the trigger"), but we can pretty clearly see on table 2 that infected model accuracy is significantly lower than clean model accuracy (95.69% vs 98.31%, respectively). That's an increase in error rate from 1.69% to 4.31% - a huge jump in any model that needs a high accuracy. I feel like understating this difference is a problem. The authors do a similar thing in VI.B, where they talk about pruning a model with only a 5% accuracy drop - 5% is still quite significant!
41
42Finally, and my biggest problem with it: We've seen two different kind of attacks (trigger-driven and neuron-driven, and two different patching techniques, but neither defense is good against both attacks! At the end of the day, we've only learned that whichever defense technique we use, a savvy attacker has a way around it. This is similar to some of the earlier defense papers against adversarial attacks where they showed a specific defense was effective against a specific attack, but didn't help the general case at all. Why didn't the authors consider a combination of neuron pruning and unlearning to hopefully provide a robust defense?
43
44Finally, I feel like the approach to backdoor attacks they present may be non-optimal. To show this, I want to use an example. Let's say I'm designing a model for ACME Co, and I want to put in a malicious backdoor. I could train a model using a specifically designed trigger (and potentially lose the accuracy of the model and possibly my job if I can't present a state-of-the-art performing model) that could be easily detected by $\ell_1$ norm, or I could train a model completely and then look, within a very limited $\ell_1$ ball, for an adversarial overlay that causes the desired misclassification. We know these attacks are possible, we know that we can optimize them over a trained model within a very restrictive $\ell_1$ bound, and we know they're difficult for humans to detect (which may not matter, but it's a bonus). What is the argument against simply attacking in this way? I may just not be getting it, but I can't figure it out.
45
46Opportunities for future research
47---------------------------------
48Can an attack like the example I gave be done? If we generate a perturbation overlay in the same $\Delta*M$ form that the paper proposes, keeping within a small $\ell_1$ bound, can these defenses identify it and mitigate it?
49
50
51
52Review #15B
53===========================================================================
54* Reviewer: Matthew David Rafuse <mrafuse@uwaterloo.ca>
55* Updated: 5 Oct 2019 9:32:45pm EDT
56
57Paper summary
58-------------
59In this paper, the authors present a detection system for backdoor attacks on DNNs, as well as mitigation techniques. They show the efficacy of their implementations via extensive testing, against two backdoor methods from previous works. They also develop and evaluate a number of adeptive attacks against their defense and identify key weaknesses/limitations to the defense.
60
61Strengths
62---------
631. The paper considers a number of adaptive attacks tailors to their developed defense, and discover a number of limitations to the defense, as well as modifications to be made to improve the defense's efficacy to these adaptive attacks.
64
652. The authors propose a very interesting optimization for their defense, shaving 75% of computation time off the process. It was intuitive and creative.
66
673. The paper contains a number of graphics that are very useful for illustrating their point. Particularly, figures 3 and 4 showed the difference between a infected label and clean labels quite well.
68
694. The forms of mitigation they describe complement each other nicely - there is little reason not to use both in unison, as mentioned in the paper.
70
71Weaknesses
72----------
731. The defense causes erosion in classification accuracy, which makes it a non-starter for many implementations. This effect is limited, but fairly large (>2%).
74
752. The threat model for the attack is pretty permissive, and I wonder if the attack is even practical in a real setting. I'm also not sure how this is any different from any other type of data poisoning. The attack seems to be less useful than a normal adversarial example attack. This leads me to question whether the defense is worth it in most cases, as the mitigation techniques generally led to a decrease in classification accuracy.
76
773. The defense is restricted to the vision domain.
78
794. The authors seem to think that the attack is difficult to see for humans, but I was able to detect every trigger in the paper. This sort of ties in with weakness 2.
80
815. No source code provided in the paper.
82
83Opportunities for future research
84---------------------------------
85Further evalution of partial backdoors.
86
87It would be interesting to use their detection algorithm to devise the smallest possible trigger for the backdoor attack.
88
89
90
91Review #15C
92===========================================================================
93* Reviewer: Sung-Shine Lee <s469lee@uwaterloo.ca>
94* Updated: 6 Oct 2019 5:32:32pm EDT
95
96Paper summary
97-------------
98The authors proposed a general approach to detect backdoor attacks in DNN. The method works on the intuition of the trigger for misclassification into a specific type of an infected model is usually significantly smaller than the clean ones. The authors presented an optimization technique to find such triggers and used the found trigger to identify the original trigger that was used in the model. Finally, A couple of mitigation method is proposed, including filter, neoron pruning, and unlearning. The efficacy of the detection method and the mitigation were analyzed, furthermore, the authors discussed about advanced backdoors and analyzed how well does the proposed methods work.
99
100Strengths
101---------
102- The presentation of the idea is inline with how people come up with a scheme therefore very easy to pick up. To be specific, the author first presented the intuition, then a mathematical observation, followed up with the method.
103- The authors were very candid when discussing about the advanced backdoors and pointed out the weakness of the proposed method.
104- The spread of the experiment result is presented, from this we can see there are interesting phenomenon. e.g. In Fig. 4, there is a huge spread on Uninfected models, where as there is nearly no spread on the Infected models.
105
106Weaknesses
107----------
108- While the author claimed the detection technique is tried on a variety of datasets, they are all in the image domain and therefore doesn't have a great diversity.
109- As the authors presented, the proposed identification method doesn't work very well when the attacker inserts multiple, independent label.
110- Less formal definition and rationale for some of the parameters. For example, the authors mentioned that [neuron activation is considered to be "similar" if the top 1% of neurons are also activated by reverse-engineered triggers, but not clean inputs ]. While the description gives a intuitive idea, I feel it would benefit from giving a formal and more rigorous definition.
111
112Opportunities for future research
113---------------------------------
114- As there are techniques that using backdoor as watermarking, how can we prove that a backdoor is not just watermarking? or even further, can a malicious party try to hide a malicious backdoor with a watermarking technique? (Effectively, this is having two different backdoors and hoping the detector to pick up the benign one)
115- How to identify the existence of backdoor if there are multiple backdoors with different triggers?
116- While the identification doesn't work, does it make sense to mitigate backdoor by running the mitigation algorithm whenever one receives a model? What is the effect of this?
117
118
119
120Review #15D
121===========================================================================
122* Reviewer: Vineel Nagisetty <vnagisetty@uwaterloo.ca>
123* Updated: 6 Oct 2019 6:16:17pm EDT
124
125Paper summary
126-------------
127In this paper, Wang et al. develop a system of defences for backdoor attacks in neural networks. Previous work has shown how backdoor attacks can be used to trick a neural network to mislabel an input to a specific label. Here, the authors introduce ways to detect if a neural network is compromised by these backdoor attacks. They further introduce ways to “sanitize” a compromised neural network by pruning and unlearning. Finally, the authors conduct extensive experiments and show the results of their defences.
128
129Problems in Understanding
130-------------------------
131How exactly is the concise trigger found? Does the algorithm look at all possible perturbations of m pixels?
132
133Strengths
134---------
1351. The threat model is very precise and the defences introduced are intuitive and sound.
136
1372. The intuition provided is excellent. Most of the questions I had while reading this paper were answered by the authors in the subsequent section. The intuition behind the difference between reverse engineered triggers and original triggers was very interesting.
138
1393. The authors considered advanced attacks that could break their defences and tried them out in Section VII – saving someone that wants to continue this research a lot of time.
140
141Weaknesses
142----------
143I enjoyed this paper and couldn’t find much weaknesses in it.
144
1451. While the algorithm for patching a compromised neural network is interesting, retraining the model seems to be the best way to go – especially since the authors show that if there are multiple triggers for backdoor attacks, we would have to patch as many times.
146
147Opportunities for future research
148---------------------------------
149As the authors state, there may be possible to increase the size of a trigger while making it less perceptible by humans. For this, we could use the idea of finding adversarial perturbations. A simple idea may be to use a subset of training data and find the average perturbation needed for each pixel to convert those images to a specific label. In effect this would be an extension of the “source-label specific (partial) backdoor”.
150
151
152
153Review #15E
154===========================================================================
155* Reviewer: Nils Hendrik Lukas <nlukas@uwaterloo.ca>
156* Updated: 6 Oct 2019 6:40:07pm EDT
157
158Paper summary
159-------------
160The authors propose a framework for finding and removing backdoors in a given neural network. A main idea is to check for shortcuts in the decision space which indicates that a backdoor exists. Thereby, the authors assume that a backdoor is confined to a small space. They convolute a 2d blank rectangle over the input image and try to reverse engineer the backdoor on that blank rectangle by checking whether all inputs are vulnerable to a changed label for some crafted input. The authors try this out for every distinct pair of labels and are thus able to reconstruct a backdoor quite sucessfully. Then, a first consideration is to remove the backdoor by neural pruning, which removes neurons most significant in the classification of an adversarial image. As this approach has a large impact on the models accuracy, the authors present an unlearning approach that retrains the model on the same input images with the reverse-engineered backdoor in place but this time the correct label is associated. This only marginally impacts the classification accuracy and often even improves it significantly.
161
162Strengths
163---------
164+ The authors actually define a threat model and models the abilities of the defender
165
166+ Very little assumptions for the defender which makes this approach remarkably practicable for many use-cases including watermark removal attacks
167
168+ Elaborate experimentations with lots of details and a whole descriptions of some defenses that did not work as good as expected
169
170+ The approach works astonishingly well for reconstructing simple triggers.
171
172Weaknesses
173----------
174- The approach explicitly only cares about 2d masks where the whole input is overwritten instead of 3d masks that are only additive.
175
176- Only considers "static" triggers, i.e. the white box and text/symbol on bottom of the image. Such a white box is arguably easy to recover, but a non-static trigger might evade their approach completely.
177
178- No explanations to adversarial examples. Authors state they have a patching technique that even for false positives removes backdoors "without affecting model's classification performance". Later, in table 4, this is not true anymore for one case (drops 3.6% for GTSRB).
179
180- Authors did poorly explain why other models suddenly perform significantly better when patching is employed. For Trojan Square, the increase in accuracy for unlearning is an incredible 8.4%! Seems like the attacker already significantly changed the model solely due to the embedding of their backdoor which itself should be detectable (i.e. train a model at multiple parties)
181
182Opportunities for future research
183---------------------------------
184Extend this work to more sophisticated backdoors that are non-static.
185
186Alternatively, evaluate how an attacker can defend against this kind of detection/removal by carefully crafting hard to recover backdoors.
187
188
189
190Review #15F
191===========================================================================
192* Reviewer: Karthik Ramesh <k6ramesh@uwaterloo.ca>
193* Updated: 6 Oct 2019 7:49:41pm EDT
194
195Paper summary
196-------------
197Prior work has shown that it is possible for an adversary to affect the training/tuning process of a neural network model such that there exists a backdoor of sorts into the network which gets activated only on certain inputs while maintaining good accuracy on non-backdoor inputs. This paper proposes techniques to identify and reconstruct backdoors that have been trained into the network. They consider two types of backdoor attacks from precious research and conduct extensive experiments to show that their detection techniques work well. In addition, they also propose techniques to mitigate the backdoor attacks.
198
199Strengths
200---------
201— I really liked how well written this paper was. It gave sufficient background knowledge without overwhelming the reader while still managing to fit the various pieces together.
202
203— There was a heavy emphasis on evaluating their techniques on various datasets and attacks.
204
205— They even tried to counter their proposed defences by examining their defences in the context of advanced attacks
206
207Weaknesses
208----------
209— Their experiments show that for neuron pruning and unlearning work for either BadNets or Trojan models. They do not have a generalized defence mechanism that is able to counter both.
210
211— Does increasing training epochs in the unlearning phase reduce the attack success rate in BadNets? Unlearning should have reduced the attack success rate of the badNets as well since it is fine-tuning the entire model. Is the number of epochs a factor in the low reduction in attack success rate?
212
213— They have only explored the visual domain
214
215Opportunities for future research
216---------------------------------
217— Ensembling the unlearning and neuron pruning to come up with a generalized defence
218
219— Explore the effect of partial backdoors (section VII-E of the paper) on this defence mechanism.
220
221
222
223Review #15G
224===========================================================================
225* Reviewer: Nivasini Ananthakrishnan <nanantha@uwaterloo.ca>
226* Updated: 6 Oct 2019 9:05:44pm EDT
227
228Paper summary
229-------------
230The paper describes a model of backdoor attacks. The model is that a backdoor attack is such that adding a trigger to any input would cause the input to be labelled a certain label. The paper proposes an algorithm to detect such back door attacks and discusses ways of mitigating them. The paper then experimentally shows that the algorithm successfully detects some real world backdoor attacks.
231
232Strengths
233---------
234Attack model is clearly stated. The attack model models some existing back door attacks. These attacks are used in the evaluation. The paper acknowledges some limitations such as high computation costs. The paper also discusses some ways to address these limitations.
235
236Weaknesses
237----------
238Some of the observations don't seem to have sufficient justification. It is not clear when some of the assumptions on the attack model are used. For example, it is not clear why they assume that majority of labels are unaffected. The modelling of triggers seems quite restrictive.
239
240Opportunities for future research
241---------------------------------
242Can we detect more general backdoor attacks? We could relax the requirement that all inputs must be labelled the same label with the addition of the trigger. The trigger could be generalized to depend on some properties of the input.
243
244
245
246Review #15H
247===========================================================================
248* Reviewer: Rasoul Akhavan Mahdavi <r5akhava@uwaterloo.ca>
249* Updated: 6 Oct 2019 9:23:53pm EDT
250
251Paper summary
252-------------
253The nature of neural networks makes prevents us from understanding the exact internals of the models. This opens up space for backdoors, certain triggers that have been maliciously inserted into a model to misclassify inputs. This paper presents an introduction on how to identify these backdoors with supporting intuition. With this intuition at hand, the paper presents methods on how to detect, identify and mitigate these backdoors. Two previous attack methods are used as a benchmark in the experiments to check the effectiveness of these defenses, which prove to effective in many cases, although having their limitations in some situations.
254
255Problems in Understanding
256-------------------------
257I didn't understand what the anomaly index in section V-B is, and how it was calculated.
258What is the L-1 norm of an "output"??
259
260Strengths
261---------
262- Unlike similar work, this paper puts the identification and mitigation all together which is good because identifying an attack is just as hard as mitigating it and mitigating an attack on a model that is certainly vulnerable is incomplete research in some sense.
263- The methods of attacks that have been used are accompanied by intuitions on how the attacks work and what part of the ML pipeline they effect. This not only helps in mitigating the attacks but also gives a general defense against the classes of attacks that obey the same high-level intuition.
264- The effect on accuracy in almost always considered, which of great importance because a defense that sacrifices too much utility is useless.
265
266Weaknesses
267----------
268The intuition behind the phenomenon of backdoors is reasonable, but not easily extendable and generalizable. What they propose might not be the only explanation for the existence of backdoors and if we simply design a backdoor attack that doesn't obey that intuition, the defense is broken.
269The paper has only taken two types of attacks into account, which although does diversify the different versions of attacks, poses limitations on the generalization of the experiments to other types of attacks.
270Also, the gap in the performance of defenses between BadNets and Trojans in some of the metrics suggests that maybe the detection and identification mechanism is working better on attacks similar to BadNets. This exacerbates the weakness of the paper in only studying two attacks.
271In the mitigation techniques that involved examination of the neuron activation, only the second to the last layer was examined, not any other layers. Also, the only countermeasure was deactivating (pruning) the neurons but the effectiveness of this method relies heavily on whether on not the model was with or without dropout. Deactivating neurons is most likely to be ineffective in models trained with dropout because they have been hardened against it.
272
273Opportunities for future research
274---------------------------------
275- The paper proposes it's defenses against backdoors created by real attacks, but can we use these same methods for extracting "natural" backdoors, which are specific triggers that have accidentally been identified by the algorithms and can cause misclassification and are probably the result of a bias in the data that has been gathered (for example all legitimate faces accidentally had glasses in our dataset). If this is possible, it could be used to identify certain faults in the dataset. The reverse trigger shows us how our dataset is unbalanced, biased, or containing unwanted features in the examples of a certain class.
276- The order of magnitude of computation for unlearning is still high. Is there any way to reduce the necessary computation? If the source of the problem is identified, can't we increase the relearning process (maybe by increasing the learning rate) so that less computation is needed?
277
278
279
280Review #15I
281===========================================================================
282* Reviewer: Lucas Napoleão Coelho <lcoelho@uwaterloo.ca>
283* Updated: 6 Oct 2019 9:28:38pm EDT
284
285Paper summary
286-------------
287The paper presents a system to detect backdoors in neural networks models for image classification by identifying the minimal pixel perturbation to cause misclassification on each class and performing an outlier detection to determine the abnormally short ones, which should indicate a backdoor. The authors conduct experiments with two different backdoor injection algorithms and propose two mitigation techniques to remove the backdoor while preserving the classification functionality.
288
289Strengths
290---------
291The diversity of datasets allow them to point out, in different moments, how a far larger number of classes affects backdoor performance and their system.
292
293Their work and especially backdoor identification proposed are quite novel, and although they still have to address some problems,
294
295The authors are very aware of the limitations of their approach. The first weaknesses that came to me as I read the paper were all acknowledged in its later sections, and the authors included a few possible attacks to exploit these weaknesses, and showed how they could be accounted for.
296
297Weaknesses
298----------
299Figure 8 is unnecessarily hard to read; why do they not use AUC?
300
301The fact that they achieve better accuracy after unlearning in some cases is concerning, because having an overgeneralized model may mask the impact of the mitigation techniques on the classification task.
302
303They do not evaluate how much data is required to use the unlearning mitigation. They specifically motivate the use of mitigation techniques instead of retraining that do not require learning data, as it may not be a realistic scenario, and then use a fraction of the training data to perform unlearning. They briefly comment on the computation cost of the technique but ignore this other aspect.
304
305Opportunities for future research
306---------------------------------
307As the authors point out, there can still be many counter-measures that avoid their detection system, and, currently, their approach is limited to image recognition tasks.
308
309
310
311Review #15J
312===========================================================================
313* Reviewer: Viet Hung Pham <hvpham@uwaterloo.ca>
314* Updated: 6 Oct 2019 9:40:28pm EDT
315
316Paper summary
317-------------
318The paper proposes a new technique that detects models which have been infected by backdoor attacks. The proposed technique first attempts to reverse engineer the trigger patterns and then detects the abnormally low L1 norm of the trigger pattern corresponding to a potential target label. The technique also patches the infected model using neuron pruning and unlearning. The evaluation shows that the proposed techniques can detect BadNets and Trojan attacks on 4 datasets. The paper also discusses the various potential advanced attacks that could potentially reduce the effectiveness of detection and shows with some small experiments that the technique still performs well in some cases.
319
320Problems in Understanding
321-------------------------
322The paper is overall easy to understand.
323
324Strengths
325---------
326+ The paper overall is easy to understand, the explanation of the observations is clear and intuitive.
327+ The formulation of the attack is concrete so that the proposed technique could perform optimization to estimate the reversed triggers.
328+ The discussion of the effectiveness of the technique against advanced backdoors is interesting as the paper addresses the various threats to validities with experiments and results. This is much better than simply discussing the threats without concrete results.
329
330Weaknesses
331----------
332- The way that the paper presents some technical details as the evaluation results are discussed is interesting, however, it makes the technical details harder to track and the readers could miss some technical details as it is buried in the evaluation section.
333
334- The technique makes a big assumption that the input with the target label would not be affected by the trigger. This is a big flaw as the attacker could intentionally make triggers for the target label to throw off this defense.
335
336- The proposed technique optimizes a loss function that creates a “minimum” triggers that could affect all input of a given golden label. These triggers might not be optimal and even if it is it would be optimal to the testing data.
337
338- The proposed method uses a continuous mask while all attacks use binary masks. To complete the evaluation testing the defense against attacks that use continuous masks.
339
340- The paper assumes a normal distribution for the L1 norm of reversed triggers to be a normal distribution for outlier detection. It is interesting to show that in practice the distribution is similar to normal distribution. Also a comparison to other simple techniques that use standard deviation or something similar.
341
342- It the threat model proposed by the paper realistic? Adversarial sample attacks have been proven to be successful against black-box models without much cost. Adversarial example attacks can also be targeted so the adversary could decide the target label. Adversarial example attacks Are much more effective since it is per input so it is much more difficult to detect and defend against. The paper should discuss the pros and cons of the “Trojan” attack vs adversarial attack.
343
344Opportunities for future research
345---------------------------------
346+ The proposed technique performs unlearning that can change all of the weight, maybe a more efficient way is to only modify the weight of the active neurons when triggers are presented. This way the cost of retraining might be even smaller as only a fraction of the weights is changed. It would be interesting to evaluate the tradeoff between the number of changed weights to the effectiveness of the unlearning process.
347
348+ The outliner detection is shown to only work with one infected label. But the reversed triggers are still detected. This means that on a clean network, the L1 norms of these triggers could be used to "estimate" (to a certain extent) the robustness of that network against target adversarial example attacks.
349
350
351
352Review #15K
353===========================================================================
354* Reviewer: John Abraham Premkumar <jpremkum@uwaterloo.ca>
355* Updated: 6 Oct 2019 10:40:31pm EDT
356
357Paper summary
358-------------
359In this paper, B. Wang et al. tackle a problem that DNNs may suffer from, which is backdoor attacks. These are attacks in which an attacker can implant behavior that can cause misclassifications whenever a "trigger" is used in an otherwise normal input.
360
361Their work is on creating a system to detect and mitigate a variety of backdoor attacks. They start by defining backdoor attacks, and the attack model under which they operate, and also the assumptions for the defense system. They provide an intuition as to how these backdoor attacks operate.
362
363They then describe their system on how to detect backdoors by using an optimization scheme. They also propose a method to reduce complexity when using models that have a large number of labels . Once they identify potential backdoors, they can choose to mitigate it in one of a couple of ways including filtering out adversarial inputs, or patching the model by removing neurons or through unlearning (which seems analogous to adversarial training).
364
365They test their detection and mitigation against a couple of types of backdoor attacks that perform backdoors in different ways (namely badnet and trojan backdoors). They also device more advanced backdoor attacks for testing against their system.
366
367They conclude by summarizing their results, and go on to say that one challenge is to adapt their system to the non vision domain.
368
369Problems in Understanding
370-------------------------
371The explanation for overlap in labels for the low cost process, and the visual/figure for it.
372
373Strengths
374---------
3751: They address the case where there are a large number of labels by trimming the optimization process of finding triggers.
376
3772: Their work is the (claimed by the authors) the first to create robust ways to detect and mitigate backdoor attacks on DNNs.
378
3793:Provide a good intuition on how the backdoor attacks work, along with visuals
380
3814:They not only test their method's robustness against badnets and trojan backdoors, but also try to create more advanced backdoors and experiment against these as well
382
383Weaknesses
384----------
3851:Uses the term ‘adversarial input’ for describing inputs with triggers, which can be confused with adversarial example whose definition is different (from other works).
386
3872:No solid reasoning/empirical evidence for the backdoor intuition (moving decision boundaries). Note: I understand it is only meant as intuition
388
3893: they say “unlearning clearly provides the best mitigation performance compared to alternatives.” even though it does not work for badnet models
390
391
3924: Quote: ”Furthermore,to evade detection, the amount of perturbation should be small. Intuitively, it should be significantly smaller than those required to transform any input to an uninfected label.” They seem to make this claim ‘intuitively’ without much significant reasoning.
393
394Opportunities for future research
395---------------------------------
396How does this perform on datasets that have much larger resolutions? Would the optimization for finding triggers take much longer?
397
398Could classification accuracy on converged models be used to help as an early warning system to detect backdoored models?
399
400
401
402Review #15L
403===========================================================================
404* Reviewer: Iman Akbari <iakbariazirani@uwaterloo.ca>
405* Updated: 6 Oct 2019 10:43:38pm EDT
406
407Paper summary
408-------------
409This paper explores backdoor attacks in neural networks where a certain secret “signal” is used to induce a certain output in the model. For instance, the attacker might contaminate the data-set to force an image classification model into returning a certain output $c_i$ whenever a certain sticker is added to an image. Although running such an attack is pretty straightforward, detecting it may be extremely hard due to the black-box nature of neural nets.
410
411The authors, present a novel approach for detecting, reverse-engineering and mitigating these back-doors. The proposed threat identification method does not assume access to the compromised dataset. Instead, it relies on the notion of minimal necessary “trigger” for causing misclassification from other outputs into the target “infected” label.
412
413Strengths
414---------
415- Good flow of the paper
416- Providing a comprehensive “intuition” before explaining the idea in terms of maths was a nice touch. Wish to see that in more papers
417- The identification phase doesn’t assume access to the compromised dataset and it is a pretty simple yet novel approach
418
419Weaknesses
420----------
421- In security, usually when a system is compromised, they just burn it. It is unlikely that “patching” a known to be infected model would have any appeal and not enough justification is given in I-B.
422- In the evaluations, a fixed “sticker” is used for the attacks which is quite simplistic.
423
424Opportunities for future research
425---------------------------------
426- The authors argue that these attacks add “a new dimension” to the classification problem. Most likely, based on what Szegedy et. al. (2013) described, this new dimension can simply mean a high activation of $\langle\phi(x),C\rangle$ for a particular vector C in one of the middle layers. Hence, trying the proposed defence approach with more complicated stickers can provide more insight into how effective the proposed optimization is.
427- Exploring other approaches where gradients of the model are used. There might be a way to train an ML model to generate the trigger and it’s interesting to see whether that might be able to keep up with this approach’s performance
428
429
430
431Review #15M
432===========================================================================
433* Reviewer: Tosca Lechner <tlechner@uwaterloo.ca>
434* Updated: 6 Oct 2019 10:50:32pm EDT
435
436Paper summary
437-------------
438This paper proposes a defense of backdoor attacks of neural networks (i.e., attacks in which the training data is manipulated in a certain way, such that if certain triggers are present for evaluation data they will be misclassified in a targeted way). Their algorithm identifies possible triggers in the training set by identifying possible triggers by finding the smallest number of pixels one might need to change to force the data to be misclassified. Then outliers with small number of pixels of this set are identified to be likely actual triggers. Their defense then consists of
439- reverse engineering the triggers and detected in incoming data and rejecting the data where a trigger is present (this is the less computationally expensive strategy)
440- unlearning the trigger (this is the more computationally expensive strategy)
441
442They evaluate their defense of several image recognition datasets and for two classes of attack models (BadNet and Trojan Models). They find that the first defense strategy works very well for BadNet, but still only has mediocre results for Trojan attacks. However, they find that their second defense works well on Trojan models.
443
444Problems in Understanding
445-------------------------
446Do triggers always have to have low pixel size? Is it possible to think of other triggers that consist of easy patterns but with higher pixel count? (Or is there some fundamental reason why small pixel size is the only thing to optimize a trigger for?)
447
448Strengths
449---------
450-Interesting problem: The problem of backdoors seems quite relevant and I liked how it was recognized and treated differently from normal poisoning attacks.
451
452- intuitive explanation: I find that they give a good intuition while explaining their defense strategy
453
454- two defense strategies: i liked how they propose two different strategies for dealing with backdoor attacks, one of which was detecting triggers, which was not always as successful, but was computationally easier and the other one was unlearning the effect, which was more successful, but more computationally expensive. I liked how they also evaluated the easier strategy and acknowledging possible real world time constraints that such an easier strategy might fulfil
455
456- first backdoor defense paper (?)
457
458- Considered several attack strategies and several image data sets
459
460Weaknesses
461----------
462- only considered image datasets
463- only considered two possible attack strategies in their evaluation, they did not investigate attack strategies that are designed to overcome their defense. Overall their defense seemed to be very motivated by one particular backdoor stategy they had in mind. It's unclear how this generalizes
464- no theoretical guarantees of the defense
465- The outlier detection only works if the triggers are indeed outliers. This does not work if there are too many triggers. They argue that there would likely be a "maximum capacity" for backdoors. I did not find this argument very convincing. It might be true, but they did not elaborate on it, did not provide any analysis or any intuitive justification, why they think that
466- Their discussion on other possible attacks could have been more elaborate
467
468Opportunities for future research
469---------------------------------
470come up with a more theoretical in depth and rigorous analysis of this (or possible better) defense strategies
471
472come up with better attack models, that are designed to overcome this defense