Author Archives: Louis Potok

Pluribus Skepticism

Is Facebook’s new poker AI really the best in the world?

Facebook released a paper and blog post about a new AI called Pluribus that can beat human pros. The paper title (in Science!) calls it “superhuman”, and the popular media is using words like “unbeatable”.

But I think this is overblown.

If you look at the confidence intervals in the FB blog post above, you’ll see that while Pluribus was definitely better against the human pros on average, Linus Loeliger “was down 0.5 bb/100 (standard error of 1.0 bb/100).” The post also mentions that “Loeliger is considered by many to be the best player in the world at six-player no-limit Hold’em cash games.” Given that prior, and the data, I’d assign something like a 65-75% probability that Pluribus is actually better than Loeliger. That’s certainly impressive. But it’s not “superhuman”.

I don’t know enough about poker or the AIVAT technique they used for variation reduction to get much deeper into this. How do people quantify the skill difference across the pros now?

I’m also a bit skeptical about the compensation scheme that was adopted – if the human players were compensated for anything other than the exact inverse of the outcome metric they’re using, I’d find that shady – but the paper didn’t include those details.

Thoughts?

Defensive Randomization

Machine learning is common and its use is growing. As time goes on, most of the options that you face in your life will be chosen by opaque algorithms that are optimizing for corporate profits. For example, the prices you see will be the highest price under which you’ll buy, as based on an enormous amount of data about you and your past decisions.

To counter these tendencies, I expect people to begin adopting “defensive randomization”, introducing noise into your decision-making and forcing corporate algorithms to experiment more broadly with the options they introduce to you. You could do this by simple coin flip, or introduce your own bots that make random (or targeted exploratory) decisions on your behalf. For example, you could have a bot log in to your Netflix account and search for a bunch of movies that are far away from Netflix’s recommendations for you.

One possible future is for these bots to share data between themselves — a guerilla network of computation that is reverse-engineering corporate algorithms and feeding them the information that will make your life more humane.

This is related to:

[mildly inspired by Maximilian Kasy’s Politics of Machine Learning]

 

Surveillance Valley

Just read Yasha Levine’s Surveillance Valley. There was a lot more new information than I was expecting but also a lot of “guilt by association” arguments and some interpretations I found a bit sketchy. Curious if anyone else has read it and what they thought. The book has two main sections.

First: the proto-history of the internet in ARPA was tied closely to concrete surveillance usecases. We usually tell the ARPANET story as an independent research arm within ARPA, but he shows that this is something of a myth – from the very beginning the intelligence community was using it to build linked databases of domestic surveillance (eg their dossiers on Vietnam War protestors). This surveillance use was recognized by the anti-war left at the time – there were large protests at MIT and Harvard against these projects. This has largely dropped out of our collective memory.

Second, and more interesting: the recent wave of anti-surveillance feeling, and the way it has centralized around Tor and Signal. The ultimate puzzle he is trying to unravel is: “privacy activists claim that Tor and Signal break the surveillance power of governments and large internet corporations. So why do those institutions support those tools and advocate their widespread use?” Specifically, the US government is a major funder of both, through a variety of entities such as OTF (https://en.wikipedia.org/wiki/Open_Technology_Fund) and the Broadcasting Board of Governors (https://en.wikipedia.org/wiki/U.S._Agency_for_Global_Media). He spends much less time discussing the large tech companies, but treats them by-and-large as collaborators with government surveillance, and makes that case pretty strongly and well.

(He also spends a lot of time in this section detailing how his previous investigations into these issues led to him being harassed online by privacy activists.)

His answer has three main components.

Answer 1: technical reasons. Tor was created as a DARPA project for spy communication – but the developers quickly realized that they would need lots of non-spy activity on Tor for the spy activity to blend into the background, which is why they opened it up and continue to fund it and advocate for it.

Answer 2: influence. The funding relationship allows the government to exert influence on these organizations, get advance notice of vulnerabilities and roadmaps, shape the direction and steer them away from things that are actively dangerous to the handlers. Somewhere in here is the possbility of backdoors, which I can’t really assess the evidence for. Part of this explanation is that by supporting a highly visible but secretly defanged privacy movement, they reduce the pressure that might otherwise cause trouble for them.

Answer 3: use of these systems as a tool to destabilize enemy regimes – the USG funds privacy training for political activists across the world, and advises them to use Tor and Signal. This is not exactly hidden – the OTF’s Wiki page cites its mission statement as wanting to “support projects that develop open and accessible technologies to circumvent censorship and surveillance, and thus promote human rights and open societies”. The extent of the activities that we’re supporting likely go deeper – we’re not above a little violent regime change – but this goal is out in the open.

There are a lot of interesting issues raised here, and the facts in this book are painstakingly documented. But ultimately I wonder if he’s seeking too consistent an explanation, in the vein of conspiracy theorists who need a simple causal pattern to explain a wide variety of events. He seems to think that “Google” and “the US government” are monolithic entities with a single volition, whose actions must be somehow consistent – this is of course not the way these institutions work, especially when it comes to the intelligence community. The story he tells (especially Answer 2) complicates and punctures the self-aggrandizing, radical-aesthetic narrative in the privacy community. But I don’t think this is as big a puzzle as he makes it out to be.

A Pattern Language

I’ve been seeing people recommend A Pattern Language (amazon, very large pdf) here and there for a few years now and finally picked it up. I’ve only begun to read it, but it is a truly remarkable work. In particular it draws a thick and complex connection between design and ethics.

(Skimming the wikipedia page of the first listed author makes me want to read much more of his work.)

This book simultaneously defines what a pattern language is, makes a case for how they should be used in design, and provides an example.

Designers of any sort (industrial designers, graphic designers software, urban planners, etc) work explicitly or implicitly based on patterns that they have learned about, developed, or identified. If I own land and want to sleep indoors, I might think about the pattern “Single Family Home” and create a design based on that pattern. And we need patterns for the whole spectrum of human existence that emerges through design, from the way our highest political entities are arranged (“Independent Regions”) through cities (“Subculture Boundaries”, “Night Life”), and so on (“Looped Local Roads”, “Compost”)

How do different patterns, though, connect to each other? There’s the concept of a Pattern Library which I’ve often seen in the tech space (example). The Library metaphor asserts that patterns should be listed and categorized. But metaphor of a Pattern Language goes much farther in exploring the rich connections between patterns, the syntax by which they can be juxtaposed, and the layers of meaning that they bring to bear when they are used together in different ways. A library can only be constructed and maintained, usually by a single entity. A library, unlike a language, does not usually develop and evolve organically.

Every society which is alive and whole, will have its own unique and distinct pattern language; and further, that every individual in such a society will have a unique language, shared in part, but which as a totality is unique to the mind of the person who has it. In this sense, in a healthy society there will be as many pattern languages as there are people — even though these languages are shared and similar…

The language described in this book, though, is more like Esperanto than like English. It is not the dictionary of any observed pattern language, it is a call for a new language that will lead to a new and better lived existence for humanity. Languages differ in the fluency with which they can express certain concepts, and so each language comes with a value system and creating a language is an ethical acts. What kind of patterns should feel natural to express? What is clunky?

The language that has emerged in our society is a stunted, depraved language without humanity. We have a pattern for billboards, for surveillance cameras, for strip malls, for old age homes.

[W]e have written this book as a first step in the society-wide process by which people will gradually become conscious of their own pattern languages and work to improve them. We believe…that the languages which people have today are so brutal, and so fragmented, that most people no longer have any language to speak of at all — and what they do have is not based on human, or natural considerations. [emphasis added]

The language in this book contains, on the contrary, patterns like the following:

  • Magic of the City
  • Old People Everywhere
  • Children in the City
  • Holy Ground
  • Connected Play
  • A Room of One’s Own
  • Garden Growing Wild
  • Communal Sleeping
  • Window Overlooking Life
  • Secret Place

These are only a few with particularly obvious ethical ramifications, but every pattern and every connection expresses an ethics, and creating such a language is a lasting way to codify your ethics.

Any such set of design principles contains within it an ethics and ethics are sometimes best expressed as design principles. In particular, I’m familiar with the conversation around dat a ethics. Usually when we talk about data ethics we are saying “here are the set of tools we’ve designed and built, and over there is our thinking about ethical ways to use them.” But those tools were also designed within a value system that is embedded not just in the design of the specific tool but the whole web of existence.

In the book’s domain (the built environment), we might think about the design of a single house. What ethics are embedded in the way a house is designed? How many people is built for, and what kinds of living arrangements? But the design of the house broadly speaking must also connect to the design of the broader society and its ethics: what materials are used, and what sorts of labor arrangements are assumed to be available? What is nearby, and what can we assume about the ways that neighborhood will change over time? What is the anticipated lifespan of this building and how might its uses change in the future?

Similarly, maybe talking sensibly about data ethics requires connecting it more deeply to the patterns we use as designers, and thinking more broadly about what those patterns are that we use and the timescales and means by which they change.

We have spent years trying to formulate this language, in the hope then when a person uses it, he will be so impressed by its power, and so joyful in its use, that he will understand again, what it means to have a living language of this kind. If we only succeed in that, it is possible that each person may once again embark on the construction and development of his own language — perhaps taking the language printed in this book, as a point of departure.

Police Science

Very much enjoying Jackie Wang’s Carceral Capitalism.

Especially liked this thought in “This Is A Story About Nerds and Cops“:

Given that critics of the police associate law enforcement with the arbitrary use of force, racial domination, and the discretionary power to make decisions about who will live and who will die, the rebranding of policing in a way that foregrounds statistical impersonality and symbolically removes the agency of individual officers is a clever way to cast police activity as neutral, unbiased, and rational.

Complex actions need specialized interfaces

Yesterday I was in a room with a Bloomberg terminal. Bloomberg is specialized software used by financial professionals to navigate data and take actions. Users interact with the system through a specialized keyboard that looks like this:

These keyboards are easy to laugh at, they look antiquated and ridiculous. They look kind of like toys for people who don’t know how to use real computers. But I really like them, or at least I like the underlying idea:

Specialized tools need specialized interfaces.

Keyboards are specialized text-entry devices. It’s easy to forget this because they are our main interface with computers, which are general-purpose engines. But it’s crazy to think that it is the best tool for every program, for every cognitive environment that we can imagine implementing in software. We knew this once but in the name of efficiency we have forgotten it.

Good history museums remind you that history is not a linear, predetermined progression. Natural history museums, for example, are fascinating not just because we can see the apes from which we descended but the vastly stranger evolutionary dead-ends. I feel the same way at the Computer History Museum. Looking back there seems to be a sort of Cambrian explosion in the late 60’s/early 70s where the fundamentals of our computers were beginning to fix in place, but the world was still wide open. This was when the mouse was invented, along with stranger beasts like the “chorded keyboard”, where you play different letters with different combinations of keypresses:

But even this is just a text input device. The keyboard is a workhorse because we have abstracted the computer towards it – because we had keyboards before computers.

But for highly specialized cognitive work, there may be better ways to interface. No one would try to play a piano with a computer keyboard and mouse. PC gamers and flight simulator use joysticks. Console gamers use specialized controllers. Why don’t we have better input devices for programming, for data analysis, for planning timelines and budgets. The Bloomberg keyboard, like the Apple touchbar, is a small halting step in this direction.

Related links:

Contributing to pandas

Very proud to announce today that I had a pull request merged into the pandas library. In version 0.21, pandas will have a new feature: a way to read in line-delimited JSON in small pieces, which can be useful when working with large files or streams.

This is a fairly small change, technically, but a big deal for me.  Pandas is one of the most commonly used tools in the data science world. When I started at TrueAccord they bought me the book on pandas (Volume 2 coming out next month!). This was my first introduction to any programming language other than Stata, an odd proprietary language that languishes on among economists and epidemiologists. Now, writing software is a core part of my career.

Related, I highly recommend  The Success of Open Source, in which Steven Weber outlines the varied ways in which open source communities elicit and channel cooperation, and explores the complex set of motivations that leads people to contribute to open source.

What I’ve been reading lately

Rebecca Solnit, River of ShadowsSolnit is a marvelous thinker and historian who moves smoothly between well-researched historical fact and philosophical reverie. Here she traces the life of Edward Muybridge whose motion studies of animals are still familiar today. Muybridge was a first-class photographer, a true artist who also made many technical innovations. Solnit takes his collaboration with Leland Stanford as the jumping-off point for an exploration of the way technology has annihilated time and space, and develops a genealogy from those two to the California of today, dominated by Hollywood and Silicon Valley. In her telling, these two industries named for physical places are at the center of a world that, in large part because of their doing, is increasingly disconnected from the world itself.

Mary Robison, Why Did I EverA few years back I made a note to myself to read this novel. I can’t recall why, or at whose urging, but I’m glad I did. Told in over 500 short fragments, Robison is funny and poignant. I was sad to have finished this book.

Diane Coyle, GDP: A Brief but Affectionate HistoryI’ve been meaning to read this for a while, but I am, so far, disappointed. GDP is the single measure that people associate with economic health and growth, to the extent that people say “the economy grew” when they mean “GDP grew”. How the economy is measured could not be more important and Coyle lays out some of the history of how GDP developed, and some of the ways in which it is flawed. This wasn’t the right level of depth for me — took some things for granted and was disappointingly shallow elsewhere — but seems like a good starting point for a deeper read into these ideas.

Nitt Witt Ridge

Art Beal spent 61 years building a house out of found materials at Nitt Witt Ridge in Cambria, CA. He served for a time  as the town garbageman, dumping his truck directly into his own backyard and rummaging for salvageable building supplies with which he slowly built a house in the shape of his own mind. There is now little trace of the 20 feet of landfill underneath the hill. where his house rests.

Beal, born in Oakland, was a celebrated long-distance swimmer in his youth but decamped in his 20’s to Cambria, 200 miles south along the California coast. He built a small house and lived in it with “Gloria” whose life is otherwise lost to history. At some point she disappeared. He abandoned that house and began constructing his masterwork, the unfinished project of the rest of his life.

There is no place in our world for some men. Through accident of birth some men are born different and they accumulate injuries in the world as they repeatedly are rammed through holes of the wrong shape. Beal was lucky. He found a place for his energy, found a way to preserve himself in a world that has no room for difference of mind.

USAFacts, Corporate Hagiography and Historical Ignorance

This morning my circles are talking about Steve Ballmer’s new government data initiative USAFacts as reported in this NYT article.

It’s an interesting project, and I am glad that this is how Ballmer is spending his dotage! It’s a lot better than going into VC as a lot of other tech execs seem to do as they age. I wish him the best.

HOWEVER

This is not the first time someone has worked on making government data more accessible. I wish that Ballmer and the media coverage around this launch spent any time at all discussing the many other similar initiatives and how this fits into the ecosystem.

For example, the mission of “a comprehensive summary” is interesting and different, but represents a tradeoff compared to deep contextual understanding. Contrast with the “Scarsdale” series by Thomas Levine https://thomaslevine.com/!/scarsdale/, for example. Also, this is a classic example of the “How Standards Proliferate” process. Everyone who comes along thinks: “If only there were one canonical home for all government data!” And then you end up with 15 different portals.

I think most notably, USAFacts doesn’t actually make their data open, they just publish reports. That’s a major departure from what a lot of other players are doing, and I wish there was any discussion about why they made that choice. Are there legal requirements connected to some of the data? Surely at least some of it could be open. Is it a desire to keep a “moat”? Who knows!

The tone around this launch irks me in the same way most tech coverage irks me. Ballmer is not the first to think of it, not by a long shot. And his effort to understand what was already out there seems….cursory, at best. Googling “open government data” would have been a very good start.

Why was this published in NYT’s DealBook section? It’s not business reporting at all. DealBook seems to exist as a WSJ competitor so the Times can attract the crowd that just wants corporate hagiography. Related: https://twitter.com/louispotok/status/423173257110372352

If you are interested in learning more about different open datasets, this may be a good start: https://thomaslevine.com/!/open-data/better-datasets-about-open-data/

Edit: There are two comment threads on HN (1 2) about this, the discussion is pretty good so far. Fave comments: