Blockchain Ethics

Class 12 Reading Responses [Blockchain and AI]

Wow, I absolutely loved this week’s readings, especially since we have been talking about data privacy in my HKS classes. I really do think federated learning and encrypted computation have a place in helping us move towards more equitable data as a public asset. If privacy can be fully achieved, we can more safely make data sets available to both the public and private sector for mass use. Governments could more effectively deploy open source data projects from these privacy technologies. This could eventually provide information in several industries, including healthcare which can leave to lives saved. Differential privacy, which allows mathematical guarantees around privacy preservation, still seems nebulous to me, because can something really produce guarantees? Also its not clear how the application of differential privacy works. If we can properly institute ways to input clean data then blockchain based machine learning marketplaces can be great to ensure data privacy and increased AI intelligence via private machine learning, which allows for training to be done on sensitive private data without revealing it. Decentralized machine learning marketplaces can dismantle the data monopolies of the current tech giants. They standardize and commoditize the main source of value creation on the internet by shifting value from data to algorithms. Other reasons include garnering top models through economic incentives, democratizes powerful machine learning and accelerates access to the open marketplace for data. I thought the most interesting benefit was how web search gets inverted with products competing to find their user. Overall, if these marketplaces do come to fruition, I’m still a bit confused about the need for a meta model as outlined in this article: https://medium.com/@FEhrsam/blockchain-based-machine-learning-marketplaces-cb2d4dae2c17
Machine learning marketplaces do have risks though in regards to the argument of garbage in and garbage out in regards to the quality of data that the machine learning algorithm is building upon. I see AI Ethics to be more pressing as compared to Blockchain ethics, since AI would be an input into blockchain, and there are still many risks with algorithm bias and data inputs. From a policy perspective I would start with AI first and propose some of the following recs:

  1. First we must continue strong R&D investment to drive technological breakthroughs, economic competitiveness and national security. A number of tools exist here including direct funding of government, research facilities, universities, private sector research, and tax incentives. Government funding spurs spinoffs and spillover effects for society.
  2. Second, it is vital to ensure the U.S. is driving the development of technical standards for the deployment and adoption of AI technologies. We will institute proper governance checks by having a community of stakeholders verify the framing of problems, collection of data, and preparation of data for AI algorithms.
  3. Third, as we think about the jobs of the future and homeland security, I have a plan to make sure we train current and future generations of American citizens with the skills to develop and apply AI techniques, through mandatory education and workforce modules. Technology jargon shouldn’t be a bunch of laughable buzz words but rather something we truly understand!
  4. Fourth, the U.S. must foster public trust in AI technologies and ensure that its usage protects civil liberties and privacy of the American people. We must target the issue of data privacy and ownership which will require cooperation with our innovation arms (such as the Defense Advanced Research Projects Agency) and the private sector.
  5. Finally, we must promote an international environment that supports AI community standards and open markets for our industries. This includes working with other countries to implement open data initiatives or democratizing access to the internet.

Hi all,

I am a sad boi since this will be the last collection of readings for this class…Anyways, privacy. Privacy is important in these contexts (ML and blockchain) because as seen in the Medium article by Dropout Labs, there are numerous exploits that jeopardize the financial stability of users and the potential for pervasive manipulation is due to the nature of the data that ML and blockchain both engage in. It is important to develop advanced AI securely, especially since models built in limited environments lead to calamities like the Flash Crash of 2010. Generally speaking, the most common method for securing data currently is end to end encryption, but the main vulnerability is the output and where the data is being displayed. Moreover, this doesn’t exactly work if ML needs access to sensitive data as in medical imaging technology. As a result, people have looked towards federated learning (distributed datasets that don’t allow direct access), secure computation protocols (Homomorphic encryption, Zero knowledge proofs, and multi-party computations), and differential privacy (algorithms that remove the presence of an individual from the data and doesn’t make assumptions about auxiliary data). I believe blockchain technology promotes privacy in its very cryptographic nature as a data structure, but other added functionality such as smart contracts pose certain vulnerabilities as we saw in the exploitation of the DAO. However, when I look at private ML (trained on private sensitive data) on a blockchain based marketplaces (as outlined in the medium article on blockchain-based ML marketplaces) that grow as a result of network effects, the potential for AI to get continuously smarter in this automated and closed loop of incentivization (esp in the case of a curation market) means that models only get better preventing further exploits as value shifts from data to algorithms breaking up the centralized power of tech giants. And as the write up of smart contracts get better back-end functionality that integrate secure practices, then I believe blockchain has a net positive impact on security.

In regards to the Zcash bug, I think it’s hard to blame anyone except maybe the designer of the parameter setup algorithm called BCTV14. As far as we know, no one took advantage of the bug and the 4 Zcash individuals who knew of its existence chose not to incite panic and snuck an update to their ZKP generator in their sapling update. They took measures to clean up the mess in the 8 month time before the update, including the removal of the paper that had the origins of the bug and created/maintained a cover story until the public disclosure. Its possible that someone could have exploited the bug, but according to the same individuals who discovered the bug, certain sprout nodes had above 0 value so not likely. Then again, given the large amount of time, bad play could have occurred. While I respect not inciting panic, I am not sure I agree with making the decision for all affected companies for them. By not disclosing the bug to them, they didn’t give them the chance to consider how they would approach solving the bug on their platform. Then again, it is hard because considering how the other co. handle the situation, Zcash may be affected. They did what was best for them, and whether that was right, I feel like I can’t comment. I agree with Neha in the fortune article, and that we need formal channels with public encryption keys, but in the case of inherited codebases, but to who are the ‘security contacts’ affiliated?

Closing remarks:
-I didn’t see many articles mention secure enclaves as a potential means of secure computation and I wonder why? Does it have to do with implementation?
-we talk about deanonymized data and then differential privacy? How exactly are they different and what about re-identification attacks as shown in the 2009 Netflix competition?
-I agree with Natalya and I am not sure the exact necessity of a meta-model nor how one is even constructed? In tune with this, how do you determine the smartness contributed to the meta-model and the same goes with data. It is a very common phenomenon in ML to have nets produce results without knowing the exact logic behind the vast number of nodes (this is the reason why trees and random forests are still used today in certain cases) Also another side note in regards to this model, would be the formatting of the crowd-sourced data. If this formatting is not native to the platform I see a potential risk in terms of tampering or internal storage of the data in the hands of this third-party formatting entity.
-also the first article from Zuckerburg was pretty good until it got suspicious when he said to ‘cache non-sensitive data’. Who decides? What is ‘sensitive’? I thought they couldn’t see lol.

Anyways, as always it has been a pleasure and I will miss this forum.

Sincerely,

The-Ripp3r

Hi All,
Seconding @natalya-thakur here that I was delighted by this week’s readings! I find privacy fascinating and hadn’t previously been aware of the overlaps between blockchain and machine learning within the privacy space. I found this overlap particularly interesting because my thesis work involves using ML (GANs) to produce synthetic data learned from real data, to produce a privacy mask on top of real (mobile trajectory) data, while maintaining underlying data distributions.
I also second @The-Ripp3r in my sadness that this is the last week of readings - wow did the semester fly by!

In response to:

They are a bit orthogonal. De-identification (deanonymized data) usually means removing explicitly identifying information, such as a meaningful user ID, or name or address, and replacing with a pseudoanonymized ID. What re-identification attacks have shown is that this crude form of pseudoanonymization is not enough. Since people and their actions or preferences are so unique (e.g. their set of movie preferences), it does not take much anecdotal knowledge about a person to reconnect a pseudoanonymized ID back to that person. After that ID is reconnected to the identify, you can continue to follow them throughout a database with that ID. Differential privacy is about adding random noise to databases, within a given parameter, so that aggregate information doesn’t change much, but the ability to know whether an individual is within that database does. Differential privacy tends to pull more directly from the crypto literature.
Happy to discuss more in class!

In response to the readings, I’d like to focus on the blogpost by Zuckerberg.
On my first naive reading, the stated commitment to privacy with end-to-end encryption and reducing permanence seemed like concessions on behalf of Facebook, for the good of society’s privacy wellbeing, as well as Facebook’s PR. Then after further reading, this “commitment to privacy” seemed like just a lucky framing for what could be best for Facebook from a business point of view.

On reducing permanence: Storing and maintaining data is expensive. As Facebook scales, it is sensible for them to avoid storing less valuable or information that is not “archived”.

On end-to-end encryption: Facebook and other companies in the communications business are put in a difficult position when governments attempt to compel them to forfeit private customer data. There is no good outcome in this tension between the demands of government and user privacy - either way Facebook loses customer trust, or money battling governments in court. It’s much less costly for Facebook to avoid this situation by using end-to-end encryption and telling both governments and users there is nothing they can do to release private user messages.