Tuesday 17 April 2012

Privacy, Dataflow and Nissenbaum ... formalisation?

I read the article by Alexis Madrigal of The Atlantic about Helen Nissenbaum's approach to privacy. It is good to see someone talking about sharing of information as being good for privacy. Maybe this is one of the rare instances that the notion of privacy has been liberated from being all about hiding your data, protecting the "consumer" to actually, in my opinion, being about how data flows.

To quote from the article and given a good example:
This may sound simple, but it actually leads to different analyses of current privacy dilemmas and may suggest better ways of dealing with data on the Internet. A quick example: remember the hubbub over Google Street View in Europe? Germans, in particular, objected to the photo-taking cars. Many people, using the standard privacy paradigm, were like, "What's the problem? You're standing out in the street? It's public!" But Nissenbaum argues that the reason some people were upset is that reciprocity was a key part of the informational arrangement. If I'm out in the street, I can see who can see me, and know what's happening. If Google's car buzzes by, I haven't agreed to that encounter. Ergo, privacy violation.

First thing here is that Nissenbaum gets us past the privacy as a binary thing: its private or public where private means hidden. Nissenbaum actually promotes the idea of how we perceive the data flow rather than whether something is private or public; again quoting from the article:

Nissenbaum argues that the real problem "is the inapproproriateness of the flow of information due to the mediation of technology." In her scheme, there are senders and receivers of messages, who communicate different types of information with very specific expectations of how it will be used. Privacy violations occur not when too much data accumulates or people can't direct it, but when one of the receivers or transmission principles change. The key academic term is "context-relative informational norms." Bust a norm and people get upset. 

For a while I've been working on formalising architectures, ontologies, taxonomies and so on for privacy (privacy engineering) - the common factor in all of these is the data-flow. Actually I think some of this is quite simple when thought of in this manner, firstly we construct a simple data-flow model:



Aside: this is quite informal and the following just sketches out a line of thinking rather than being a definition.

Some information I flows from A to B. For this information I we can extract a number of aspects: sensitivity, information type, identity (amount of) etc. We can also ask the question of this particular interaction (A,I,B) of whether that information I is relevant to the particular set of transactions or services that B provides. If B requires a set of information H to work for fulfil the contract with A then I<=H in this case, which allows A to supply less but should discourage B asking for more.

We can also look at other factors in this to make that decision: the longevity of information in B, the ownership of the information once passed to B and importantly, whether B passes this information on - we come to this latter point later. Ultimately we can assign a weight to this data-flow, though what form of metric this is I don't have a good idea about at the moment but let's call it 'm', ie: a(I) is some measure of the 'amount of information' weighted by the various aspects and classifications. The above I<=H should then be rewritten as a(I)<=a(H) which better takes into account of the weightings of the information classifications.

This we can continue through a number of other flows and introduce a typing or taxonomic structure for the nodes:



As B is a bank then the amount of information required tends to be high, if C is on-line shop, then this tends to be lower and so on. Such a rule might be:

forall u:User, b:Bank, c:OnlineShop, d:NewsSite |
    a( u-->b ) => a( u-->c ) and
    a( u-->c ) => a( u-->d )
    ...

For each node, we can better describe the expectation in terms of this metric, ie: a(b) where b is the Bank node from above, we get the rule from earlier:

forall u:User, b:Bank |
    a( u-->b ) <= a(b)

Now our weighting function a deals with particular instances, where as we have stated that that there are expectation, so let's introduce a new function that computes a range for a given type, for example r(Bank) returns a range [ r_min, r_max ]. Then for a particular instance of Bank we get

forall b:Bank |
      r_min(Bank) <= a(b) <= r_max(Bank)

If a given instance, for example e in the above data-flow requires something outside the range for its type then we are "busting a norm" for that particular type, and following on from the above rules:


forall u:User, b:Bank |
      r_min(Bank) <= a(b) <= r_max(Bank)
         and
      a( u-->b ) <= a(b)


The next thing is to look at the next level in the data-flow graph, to where do B,C,D and E send their information, how much and how do these data-flows affect the first - I guess there's a very interesting feedback loop there. A few other things spring to mind as well: do we see a power law operating over the weighting of the data-flows? Does it matter to where and how much data flows?

Introduce a temporal dimension and plot the above over time and you get a picture of the change in norms and consumer expectations.

Getting back to Nissenbaum's thesis which is that the expectation of privacy over data-flows is the key and not whether the data-flows at all, I think we could reasonably model this.

No comments: