Kako verjetno odkriti terorista s PRISM?

Medtem ko se svet še ukvarja z vprašanjem, kam za vraga je izginil Edward Snowden in medtem ko ZDA poskušajo rešiti outsourcan sistem zbiranja zasebnih podatkov PRISM (in ostale, ki še niso bili razkriti), se je vendarle treba vprašati, kako učinkoviti so tovrstni dragi sistemi odkrivanja potencialnih negativcev. Dober šolski pripomoček za ilustracijo učinkovitosti sistema je pred tremi tedni dal Corey Chivers z nekaj preprostega bayesianskega sklepanja. Verjetnost najti “pravega terorista” med vsemi napačno identificiranimi potencialnimi teroristi je dokaj majhna. Kljub stotnijam vrhunskih statistikov in kljub vrhunskemu softverju za “miniranje” velikih podatkov je še vedno podobna iskanju igle v kopici sena.

Za tiste, ki ste prešpricali matematiko, ko je bil še čas za to, priporočam, da za razumevanje Chiversove razlage najpreje konzultirate wikipedijo glede bayesianskega sklepanja. Bistvo oziroma lepota bayesianskega sklepanja oziroma verjetnosti je v upoštevanju apriorne subjektivne ocene verjetnosti nekega dogodka, kar nato soočimo s podatki ter tako postopno spremenimo lastno subjektivno oceno verjetnosti. Čim bližje smo dogodku oziroma čim več imamo informacij tem bližja bo naša ocena verjetnosti dejanski verjetnosti.

Chivers:

Some may argue that there is a necessary trade-off between civil liberties and public safety, and that others should just quit their whining. Lets take a look at this proposition (not the whining part). Specifically, let’s ask: how much benefit, in terms of thwarted would-be attacks, does this level of surveillance confer?

Lets start by recognizing that terrorism is extremely rare. So the probability that an individual under surveillance (and now everyone is under surveillance) is also a terrorist is also extremely low. Lets also assume that the neck-beards at the NSA are fairly clever, if exceptionally creepy. We assume that they have devised an algorithm that can detect ‘terrorist communications’ (as opposed to, for instance, pizza orders) with 99% accuracy.

P(+ |  bad guy) = 0.99

A job well done, and Murica lives to fight another day. Well, not quite. What we really want to know is: what is the probability that they’ve found a bad guy, given that they’ve gotten a hit on their screen? Or,

P(bad guy | +) =??

Which is quite a different question altogether. To figure this out, we need a bit more information. Recall that bad guys (specifically terrorists) are extremely rare, say on the order of one in a million (this is a wild over estimate with the true rate being much lower, of course – but lets not let that stop us). So,

P(bad guy) = 1/1,000,000

Further, lets say that the spooks have a pretty good algorithm that only comes up falsely positive (ie when the person under surveillance is a good guy) one in one hundred times.

P(+ |  good guy) = 0.01

And now we have all that we need. Apply a little special Bayes sauce:

P(bad guy | +) = P(+ | bad guy) P(bad guy)  /  [ P(+ |  bad guy) P(bad guy) + P(+ |  good guy) P(good guy) ]

and we get:

P(bad guy | +) = 1/10,102

That is, for every positive (the NSA calls these ‘reports’) there is only a 1 in 10,102 chance (using our rough assumptions) that they’ve found a real bad guy.

Ali z drugimi besedami, ob danih grobih predpostavkah je verjetnost, da se med “positives” (med vsemi terorizma osumljenimi posamezniki) nahaja dejanski terorist, enaka 1 : 10,000. Ali še drugače povedano, ameriški program spremljanja zasebne komunikacije bo osumil mnogo nedolžnih posameznikov iz še večje množice tistih, katerim od vlade plačana zasebna podjetja spremljajo njihovo zasebno komunikacijo. In logično vprašanje, ki se ob tem porodi, je seveda, ali je smiselno (iz vidika človekovih pravic) in ekonomsko učinkovito ob tako nizki verjetnosti dejanske identifikacije “pravega terorista” zapraviti toliko davkoplačevalskega denarja za poseganje v zasebno komunikacijo posameznikov.

%d bloggers like this: