The Role of Data Trusts in          Smart Cities

During World War II, fighter planes would return from battle with bullet holes. The Allies found the areas that were most commonly hit by enemy fire. They sought to strengthen the most commonly damaged parts of the planes to reduce the number that was shot down. A mathematician, Abraham Wald, pointed out that perhaps there was another way to look at the data. Perhaps the reason certain areas of the planes weren’t covered in bullet holes was that planes that were shot in those areas did not return. This insight led to the armor being re-enforced on the parts of the plane where there were no bullet holes. The story behind the data is arguably most important that the data itself. Or more precisely, the reason behind why we are missing certain pieces of data may be more meaningful than the data we have. Wald’s theory is intricately applied to privacy, rather than constantly focusing on regulation and theories to solve privacy issues. A different perspective should be taken. This paper will look at data trusts, from my own point of view, as the best solution to solving individual privacy concerns. And how the data trust may be applied universally through the use of blockchain technology.

What is A Data Trust?

Diverse, open and often conflicting definitions are provided to the term data trust. In general, it is the means, through technology and legal tools, to entitle data sharing and improve the technologies that rely on the sharing of such data. At the very core is privacy law, and the developments to underscore the rights to protect one’s personal information. The “basic idea of a data trust is a virtual place where data is made available to share”[1]. As the definition is broad and conflicted, there is a general consensus that it is effective data governance and management from users of the technology between civic authorities, the industry and an external body operated by academics being experts in the field of privacy and security. It is a stewardship of those who have nothing to gain (knowledge experts) and those who have everything to gain (the platform) to fundamentally oversee, control and deliver according to the what remains best individual users of technology involving personal data.

Data trust is a paradigm that facilitates data sharing by forcing data controllers to be transparent about the process of sharing and reusing data.

The language of “trust” has been a source of confusion and contestation. Jack Balkin[2], an early father of the data trust upholds the definition in law, a “trust” is a very specific kind of legal arrangement where a trustee manages trust property on behalf of the trust beneficiaries. The legal trust underscores the power to the people, to enhance individual control over personal information[3]. A fiduciary trust is attractive when thinking about how to ensure responsible decision as the trustees, being the knowledge experts are enshrined in a fiduciary obligation to their beneficiaries.

If one were to develop a mission statement as to the purpose of the knowledge experts in a data trust, it would revolve around them being relevant, reliable, and built on integrity with\\ emphasis that the knowledge experts and the data trust as a whole must be credible, transparent and foremost have the capacity to execute.

Concerns

It is important to establish that data trusts do not solve the privacy concerns inherent in the technology ecosystem. They remain an integral part of the delivery mechanism that balances innovations with the rights of the users and service providers. The data trust is critical part of innovation and reliance on data from every individual. And the data trust must be part of the beginning of any part of the technology ecosystem. A data trust is the quarterback to the primary two of many concerns of privacy in the technology ecosystem; informed consent and the ability to opt out.

The difficulty of obtaining informed consent in a technology is that is simply impossible, as of current no academic or entity has been able to see otherwise. And although there are hopes in the future that technology itself may resolve the ongoing issue, that is all, a vision of foresight remaining in the distance and out of our current line of sight. When a user comes about in publicly accessible spaces the use of space is deemed consent. With a further argument on whether personal information also requires informed consent in the public realm. Canadian privacy statutes require consent for the collection of personal information that relates to an identifiable individual. Transaction data involves user consent and is based upon contractual terms-of-use agreements between users and typically for-profit entities. People grant consent for parties to collect, store, and use their data when they download apps or sign up for services. This is not viable in the technology ecosystem as it is generally understood people do not read or fully understand the often-long and complex contractual agreements. Just because people have clicked the ubiquitous “I agree” does not mean that they appreciate the terms or comprehend how their data may be used. The professed challenge, is simple, how to get the users of the technology to execute consent forms in public spaces? Second, why should consent be required in public spaces like parks and publicly accessible private spaces like stores? Does not the entry of the user suffice the requirement of consent? As the collection of non-identifiable data in public spaces or publicly accessible private spaces does not require a person’s consent. However, even when personal information is collected, such as through cameras. Sidewalk Labs Toronto position was to not restrict the collection of identifiable data instead de-identifying the data. Simply, anonymization of data has proven to be unlikely to succeed in the protection of privacy as the de-anonymization is relatively easy with about three pieces of identifiable information about the user[4].

Leading to the second issue, how is it fair or legal that a user is not able to access the benefits of the innovation if they do not consent? This is a difficult challenge. People may decide not to use a certain app or service, or chose an equivalent service that offers different data collection practices. However, when data collection is tied to locations or infrastructure like transit, there is no way to opt out of data collection short of leaving the area in question. People who decline to have their data captured may consider certain technology, and certain services, off limits. A related problem is the blanket nature of consent.

A third concern, not relating directly to privacy more so to the data trust itself, who owns this information? Along with what this data includes for example facial recognition, weight and height measurements, BMI? Will the sensors interact with people’s mobile phones, capture purchases, note where selfies are taken and assess each person’s purchasing potential? For residents of the development, will their daily doings be monitored, their smart home Internet of Things (IoT) connected to the larger IoT infrastructure outside? This answer has no standard and follows each project individually. From the perspective of this paper, it remains integral to the development story of any data trust. The solutions discussed in the following paragraphs will attempt to gravitate the possible outcomes.

Blockchain

Data trusts do not guarantee users privacy concerns become transparent, let alone addressed. A data trust is reliant on the governed mechanism holding the data, in essence the trust is governed by a charter created by the trust’s settlor, where its rules can be made to prioritize interests of the innovator and developer of the technology. The trust is run by a board, which means a party that has more seats gains significant control. The ideal solution to avoid this involves setting up data stewards from both public and private sectors along with regulation.

Sylvie Delacroix and Neil Lawrence[5], the originators of this bottom-up approach, liken data trusts to pension funds, saying they should be tightly regulated and able to provide different services to designated groups. Data subjects, choose to pool the rights they have over their personal data within the legal framework of the trust. In their proposal, the data subjects tend to be both the settlors and the beneficiaries of the trust. The trustees are compelled to manage the subjects’ data according to the terms of the trust. They have a fiduciary responsibility towards the data subjects being the beneficiaries of the trust. Envisioning an ecosystem of trusts arising out of a mix of publicly and privately funded initiatives, each with different constitutional terms, allowing data subjects to choose among different approaches to data governance. A successful trust would be in control of more data and be able to deliver more benefit to data subjects. Balkin[6] also suggested this approach with the not-for-profit entity holding the majority of the data through the terms most beneficial and chosen by the beneficiaries of the data.

The concept of data stewards is a great one no doubt, one issue raised is who are the beneficiaries? Is it the users of the technology? The developer of technology? Or both? Reviewing the Toronto Sidewalk Labs[7] development, there the general agreement was to develop an “independent urban data trust”. Where the concept failed, is who would be the beneficiary? Sidewalk Labs civic data trust appeared to be a legal entity facing questions as to the vagueness of its conception of trusts and questions of how the trust would operate under Canadian law. Sidewalk Labs failed to specify who it considers to be likely specific beneficiaries for the trust, they underscored that the general public would be the beneficiaries. Under Canadian law, the general public cannot be a beneficiary of a trust[8]. As well, it is highly problematic for the entity that would be regulated by the trust to propose the structure, operation, and regulatory power of the regulator. Certainly, it’s self-serving for Sidewalk Labs to submit that its projects should be first in line for consideration by a neophyte regulator that it proposed.

Leading to a significant challenge in relation to the proposed data trust are the trustee’s roles in creating and enforcing rules regarding data collection, storage, protection, and use, including commercialization. Depending on how the regulatory body is structured and its legal authority, the data trustees, whether public or private actors, could have considerable regulatory power. Details on how the trust’s roles beyond granting approval for and overseeing data collection, access, and use. Balkin[9] suggests that a trust should have the authority to audit and investigate data collection and use. This would give data trustees significant power, similar to that of a government regulator, so how a data trust is structured and who operates it are important questions[10].

Blockchain contains several features that resolve the aforementioned concerns of a data trust. Bitcoin, as the earliest technical application of blockchain technology, has attracted widespread attention because of its decentralized, unalterable, and traceable transaction characteristics. From a data perspective, blockchain technology is essentially a distributed database that collectively maintains and stores all historical transaction data in a decentralized and trustless way through a distributed ledger maintained by blockchain only supports query and addition but does not support modification and deletion[11]. The use of hash linked list and Merkel tree structure ensures that no node can illegally tamper with the ledger[12]. Hyperledger Fabric is the representative of enterprise-level open-source blockchain. It has proposed many schemes in terms of permission control and privacy protection[13].

The features of blockchain that make it more efficient and reliable are security, scalability, immutability, and anonymity. Blockchain technology is beneficial due to its decentralized, transparent, and secure nature. A trusted data acquisition model for power systems is proposed in conjunction with blockchain technology[14]. Blockchain realizes the authenticity of the underlying equipment state parameters of the power grid. In order to protect the privacy information in the power consumption data, a blockchain-based privacy data and identity protection scheme is proposed, where group membership data is recorded in a private blockchain, and, by using pseudonyms, the user’s private identity within the group is hidden[15].

Zyskind[16] et al believe blockchain in a data trust protects against several common privacy issues such as Data Ownership, providing users own and control their personal data. The ledger structure or blockchain recognizes the users as the owners of the data, along with the trustees of the data trusts, ensuring the guests are provided with delegated permissions. Second, Data Transparency and Auditability where each user has complete transparency over what data is being collected about her and how they are accessed. Fine-grained Access Control. One major concern mentioned previously is that users are required to grant a set of permissions upon sign-up. These permissions are granted indefinitely and the only way to alter the agreement is by opting-out. Instead, the theory of data trusts and blockchain working hand in hand is that at any given time the user may alter the set of permissions and revoke access to previously collected data. One application of this mechanism would be to improve the existing permissions dialog in mobile applications. While the user-interface is likely to remain the same, the access-control policies would be securely stored on a blockchain, where only the user is allowed to change them. This is an integral solution to universal data trusts, resolving a number of concerns in privacy law[17].

Three Theories of Combining Blockchain and A Data Trust

There are three methods of applying blockchain to data trust and each will be reviewed. First is Zyskind et al[18] product where on the premise that the blockchain is itself the data trust, controlled by a few trustees that hold voting rights on the mechanism. The blockchain foundation relies on anonymization of the material, and when relevant and required approved by the trustees to reveal any personal information or matters that relate to an individual’s privacy. Users in the system normally remain (pseudo) anonymous, the option to store service profiles on the blockchain and verify their identity. The blockchain accepts two new types of transactions: Taccess, used for access control management; and Tdata, for data storage and retrieval[19]. As the user signs up for the first time, a new shared (user, service) identity is generated and sent, along with the associated permissions, to the blockchain in a Taccess transaction[20]. Data collected on the phone is encrypted using a shared encryption key and sent to the blockchain in a Tdata transaction, which subsequently routes it to an off-blockchain key-value store, while retaining only a pointer to the data on the public ledger[21]. Both the service and the user can now query the data using a Tdata transaction with the pointer (key) associated to it. The blockchain then verifies that the digital signature belongs to either the user or the service. For the service, its permissions to access the data are checked as well. Finally, the user can change the permissions granted to a service at any time by issuing a Taccess transaction with a new set of permissions, including revoking access to previously stored data[22]. Developing a web-based (or mobile) dashboard that allows an overview of one’s data and the ability to change permissions is fairly trivial and is similar to developing centralized-wallets.

A very beneficial aspect of this blockchain methos is the use of the public-key, where every user can generate as many such pseudo-identities as she desires in order to increase privacy. For a data trust, the authors identify an enhanced method of identity, compound identity, being the shared identity for two or more parties, where some parties (at least one) own the identity (owners), and the rest have restricted access to it (guests)[23]. The identity is comprised of signing key-pairs for the owner and guest, as well as a symmetric key used to encrypt (and decrypt) the data, so that the data is protected from all other players in the system. The authors deliver an intricate formula to the compound identity, to summarize they use the blockchain sequence of timestamped transactions along with the nodes of the data trustees and user to deliver the service providers requested data in an encrypted fashion with just the right amount of information. The technology further follows the service providers use of the technology and continues to have access to data. This is beneficial as once the service provider no longer requires the data; the data trustees have the ability to trace the data and delete it from the service providers servers. This is accomplished through the use secure Multiparty Computation (MPC) to securely evaluate any function. The authors provide an intricate example[24]:

A city holds an election and wishes to allow online secret voting. It develops a mobile application for voting which makes use of our system, now augmented with the proposed MPC capabilities. After the online elections take place, the city subsequently submits their back-end code to aggregate the results. The network selects a subset of nodes at random and an interpreter transforms the code into a secure MPC protocol. Finally, the results are stored on the public ledger, where they are safe against tampering. As a result, no one learns what the individual votes were, but everyone can see the results of the elections. Bitcoin, or blockchains in general, assumes all nodes are equally untrusted and that their proportion in the collective decision-making process is solely based on their computational resources (known as the Proof-of-work algorithm[25]. In other words, for every node n, trustn ∝ resources(n) (probabilistically) decide the node’s weight in votes.

The real value to the authors blockchain solution is the relative ease the data trusts could develop and implement of such technology. Of the three solutions discussed herein, this method is the simplest and the most practical. However, criticism of the authors technology is that it leads to sybil attacks, excessive energy consumption and high-latency. Intuitively, Proof-of-Work reasons that nodes which pour significant resources into the system are less likely to cheat[26]. Using similar reasoning we could define a new dynamic measure of trust that is based on node behavior, such that good actors that follow the protocol are rewarded. Specifically, we could set the trust of each node as the expected value of it behaving well in the future. Equivalently, since we are dealing with a binary random variable, the expected value is simply the probability. A simple way to approximate this probability is by counting the number of good and bad actions a node takes, then using the sigmoid function to squash it into a probability. The authors provide a complex formula to articulate probability and deal with the aforementioned criticisms. From my point of view the benefits of the blockchain method far outweigh the cons and criticisms delivered. Blockchain as a whole has the potential to retain control of sensitive data or data in general. With both the user and the trustees of the data trust continuously delivering or removing aspects of the data as they deem fit. Along with the continuous “eye” on the location and the use of the data. This is by far the best solution to data for privacy.

An alternate solution, continuing on the benefit of blockchain in data trusts, is Zhang et al[27] formula. Distinctly different from the Zyskind et al method it relies on multiple layers of control where Zyskind was about one layer through a continuous flow. The Zhang et al formula also does not deliver the same ease of use as the Zyskind formula, and it is inherently complex to understand let alone deliver. Yet, it proves blockchain is the best solution for the delivery of a universal data trust. In summary, the formula aims at a typical multi-party collaborative data sharing scenario. Each party possesses some data resources locally. Through data sharing the scattered data can be processed together and creates greater value. Towards this scenario, the formula provides a design to data sharing architecture based on permissioned blockchain and federated learning[28]. The architecture consists of three layers, which are user layer, federated learning layer and permissioned blockchain layer from the bottom to top. User layer is the significant difference where Zyskind et al[29] focused on anonymized nodes or pseudonyms Zhang et al provides that each provider of the data and user of the data have access to the FL Layer which is considered a resource pool. This access is only available once the individual has been granted permission through the blockchain. Permission is granted through the blockchain by the use of smart contracts and the delivery of review provided by artificial intelligence. After permission is granted, data may abstracted and used at leisure. Zyskind et al formula provided significantly more control that only permissions to each service were allowed the data granted along with the key difference between the two that allowed the trustees or the nodes to remove the data from the service. Zhang et al lacks this ability, yet one may argue it makes up for it with the blockchain being inherently smarter through the use of artificial intelligence. As artificial intelligence itself would decipher and track the data more efficiently and effectively than a human trustee in a data trust.

A further alternative, is the Ericsson Alternate[30], that is primitive in nature to the previous formulas yet provides a good example that blockchain is the solution to data trusts which are the solution to privacy concerns mentioned previously. Ericsson provides a concept called “ID brokering”[31]. A decentralized system for ID brokering based on a concept that creates trust relations between digital identities and the systems that handle them. The system capitalizes on the strength of blockchains to express and manage trust relations in industrywide solutions and creates a unified mechanism for ID management across underlying heterogeneous ID technologies. ID brokering makes it easy to establish encrypted and trusted connectivity for IoT devices that are on the move, or for personal devices that are carried across different administrative network domains. For example, by allowing device IDs to act as digital passports and registering the (non-sensitive) passport IDs of devices when booking a trip, the networks the devices pass through (including airports, hotels and conference facilities) can use their own trusted IDs to grant secure internet access without manual authentication. The ID brokering concept is based on three key aspects[32]:

1. the self-sovereignty of ID domains, where devices are provisioned with any secure ID technology deemed appropriate, and where the ID secret is securely stored in a TEE.

2. authentication utilizes the trust relation expressed in a blockchain-based backend, where instantaneous access rights for specific devices in specific networks are managed.

3. the blockchain backend enables the system to reach a shared consensus on a global scale, as no single party is the main controller or beneficiary of the system. Each domain owner has full sovereignty of their domain, and shared context of the blockchain enables a domain to interact and to grant and revoke access dynamically.

The three examples herein prove blockchain technology resolves the privacy concerns for the use of a universal data trust in handling an individual’s data. The Zyskind formula is simply the best for its accuracy, control and ease of use. If one where to combine the three, with the artificial intelligence aspect of Zhang et al, and the enhanced identity control of ID brokering from Ericsson. Combined, it would be a formidable force for data trusts and user privacy.

Conclusion

 In order for the technology ecosystem to succeed, a trust ecosystem must be established. Technology developers equate technology trust with privacy and cybersecurity. Distilling trust into these two elements oversimplifies the challenges involved and leads to inadequate solution approaches. The data trust regulated by a neutral third party, the data steward is a balance to the challenges.

At a recent congressional antitrust hearing in the United States, four major platform companies publicly recognized the use of surveillance technologies, market manipulation, and forceful acquisitions to dominate the data economy. The single most important lesson from these revelations is that companies that trade in personal data cannot be trusted to store and manage it. Decoupling personal information from the platforms’ infrastructure would be a decisive step toward curbing their monopoly power. This can be done through data trusts along with blockchain technology. Combined the premises of privacy are protected, preserved and most of monitored for the efficiency of technology and the benefits thereof. Creating a universal standard for the use of data and delivery mechanism through the data trust.

References

Asma Khatoon, P. V. (2019). Blockchain in Energy Efficiency: Potential Applications and Benefits. Energies. Matering Blockchain, 12(17).

Balkin, J. M. (2016). Information Fiduciaries and the First Amendment. 49 U.C. Davis L. Rev 1183.

Bashir, I. (2017). Mastering Blockchain. Packt Publishing.

Braun, T., Fung, B., Iqbal, F., & Shah, B. (2018). Security and Privacy Challenges in Smart Cities. Cities Society, 39, 499.

Brussels, E. C. (2020 , February 19). White Paper On Artificial Intelligence - A European approach to excellence and trust. 65 Final.

Chan, B. (2019, March 13). Smart City Trust - Think Beyond Cybersecurity and Privacy. Strategy of Things. Retrieved from strategyofthings.io/smart-city-trust

Daniel Bergstrom, B. S. (2019, April 4). BLOCKCHAINS AND ONLINE TRUST CHARTING THE FUTURE OF INNOVATION. Blockchains and Online Trust, 3.

Dawson, A. H. (2018, October 15). An Update on Data Governance for Sidewalk Toronto. Sidewalk Labs Blog. Retrieved 09 30, 2021, from https://www.sidewalklabs.com/blog/an-update-on-data-governance-for-sidewalk-toronto/

Edwards, N. J. (2020). Who Trust in the Smary City? Transparency, Governance and the Internet of Things. Data and Policy, 2: e11.

Fan Zhang, S. G. (August 2015). Federated Learning Meets Blockchain: State Channel based Distributed Data Sharing Trust Supervision Mechanism. Journal of Latex Class Files, 14.

Guy Zyskind, O. N. (n.d.). Decentralizing Privacy: Using Blockchain to Protect Personal Data. iapp.

Jayachandran, P. (2017). The difference between public and private blockchain. Blockchain: Background and Policy Issues. Congressional Research Service.

Lawrence, S. D. (n.d.). Disturbing the "One Size Fits All" Approach to Data Governance: Bottom-up Data Trusts. Internation Data Privacy Law, 9(4), 236-252.

Nissenbaum, H. (2004). Privacy as Contextual Integrity. Washington Law Review, 101-139. Retrieved from https://crypto.stanford.edu/portia/papers/RevnissenbaumDTP31.pdf

O'Hara, K. (2019, February 23). Data Trusts: Ethics, Architecture and Governance for Trustworthy Data Stewardship. Web Science Institute White Papers. Retrieved from http://dx.doi.org/10.5258/SOTON/WSI-WP001

S. N. Grinyaeva, R. A. (2018). On the Creation of a Universal Protected Trusted Digital Asset (Token). Automatic Documentation and Mathematical Linguistics, 52(5), pp. 265–273.

Tuch, A. F. (December 31, 2020). A General Defense of Information Fiduciaries . Harvard Law School Forum on Corporate Governance.

Weigl, M. E. (2017). Overview of the Twenty One Year Rule . Faskin Martineau.

 

 


[1] (O'Hara, 2019)

[2] (Balkin, 2016)

[3] (Lawrence)

[4] (Nissenbaum, 2004)

[5] (Lawrence)

[6] (Balkin, 2016)

[7] (Dawson, 2018)

[8] (Weigl, 2017)

[9] (Balkin, 2016)

[10] (Tuch, December 31, 2020)

[11] (Asma Khatoon, 2019)

[12] (Bashir, 2017)

[13] (Bashir, 2017)

[14] (Bashir, 2017)

[15] (Jayachandran, 2017)

[16] (Guy Zyskind)

[17] (Guy Zyskind)

[18] (Guy Zyskind)

[19] (Guy Zyskind)

[20] (Guy Zyskind)

[21] (Guy Zyskind)

[22] (Guy Zyskind)

[23] (Guy Zyskind)

[24] (Guy Zyskind)

[25] (Guy Zyskind)

[26] (Guy Zyskind)

[27] (Fan Zhang, August 2015)

[28] (Fan Zhang, August 2015)

[29] (Guy Zyskind)

[30] (Daniel Bergstrom, 2019)

[31] (Daniel Bergstrom, 2019)

[32] (Daniel Bergstrom, 2019)

John Sedrak

John Sedrak is a world renowned lawyer, known for his work in privacy law, holding several Masters of Law under his belt. Joined Aether in 2022 as Associate Counsel and quickly rose to become General Counsel, Associate Director. John has been working extensively in Blockchain, Privacy and Cybersecurity, specializing in Smart Cities. John may be scheduled for in-house workshops and masterclasses, which we are told he enjoys very much.

Previous
Previous

Understanding Privacy Compliance Under PIPEDA: A Comprehensive Guide

Next
Next

intellectual property and NFTs