← Tiny Subversions

A highly opinionated guide to learning about ActivityPub

by Darius Kazemi, Jun 1, 2019

This document is for programmers who take one look at activitypub.rocks, click on through to the documentation, and can't make heads or tails of it.

In other words this document is for me, one year ago.

IMPORTANT NOTE: This document does not explain ActivityPub. It explains how to learn about ActivityPub.

If you want THE ULTIMATE TL;DR, just skip to that section.

Table of contents

Introduction

The one big misunderstanding that a programmer can have when attempting to learn about ActivityPub is to assume that all the information you need is in the ActivityPub spec! Really, to learn about ActivityPub in such a way that it will be useful in a concrete sense, you need to be familiar with three specs:

Those two extra documents are in fact mentioned in the ActivityPub spec itself, so I'm not claiming that they are somehow hidden. But I'm a lazy, impatient reader. When I'm given a spec like ActivityPub that is ~10,000 words of spec language... well. I'm going to try and take all the shortcuts I can.

I thought I was the only programmer with this problem, but as I spoke to more and more people at conferences, I realized there are many others who, for example, look at ActivityPub and yet have no idea there exists an Activity Vocabulary spec.

Start with Activity Vocabulary

I am a concrete thinker. I don't like reading about a system for passing messages between computers if I can't have some sort of understanding of what exactly is being passed around.

Unfortunately, the ActivityPub spec is mostly all about the "publication" part and not the "what data are we sending around" part.

For this reason, I recommend that you start your journey into ActivityPub by first reading the Activity Vocabulary spec.

Reading Activity Vocabulary

This spec is in five non-appendix sections:

  1. Introduction
  2. Core Types
  3. Extended Types
  4. Properties
  5. Implementation Notes

Introduction

The intro is brief, and is your usual spec boilerplate about the definitions of MUST and SHOULD.

Core Types

"Core Types" is where you might think you should start. It's the core, right? Well, you're wrong. The first thing you're hit with is a definition of Object:

Describes an object of any kind. The Object type serves as the base type for most of the other kinds of objects defined in the Activity Vocabulary, including other Core types such as Activity, IntransitiveActivity, Collection and OrderedCollection.

The example given is:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Object",
  "id": "http://www.test.example/object/1",
  "name": "A Simple, non-specific object"
}

Wow. That's real useful. Now I know how to define a non-specific thing with no properties.

(Aside: this "begin from first principles" thing is a problem with almost all spec writing. It is largely how specs are written and it fucking blows. The authors of these documents are just following tradition and I do not wish to paint these specs as particularly bad.)

Just skip the Core Types for now and move onto section 3, Extended Types.

Extended Types

Ah, Extended Types. Or as I like to call them: the actually-useful types. Something like 50% of the stuff you actually care about is in this section.

We are given three major kinds of "types". (Um, by the way, this isn't quite a "type" as you may know it. It's really more like a JSON data format.)

Activity Types

We are once again immediately hit over the head with this doozy of a sentence:

All Activity Types inherit the properties of the base Activity type.

While this does make sense in a highly specific way that is unique to specification writing, please just skip this sentence and all sentences like it (there is at least one other saying the same thing about Object Types).

Best to avoid the near-tautological definitions required for writing things like parsers and instead skip to the concrete stuff. Scroll down and you'll see that Activity Types include things like:

Hey!! This is good! This is like... stuff that users do on social networks. Now we're getting somewhere.

Browse over all the activities. Look at the "Notes" section and glance at the example JSON object. There are some interesting activity types in there! For example, there's Question, which is the activity type that Mastodon uses in its implementation of polls. And there's Travel, which is meant to indicate movement from geographic (or even conceptual) point A to point B! You might make extensive use of this if your social network supported a Facebook style "I've checked in to New York City right now" post. There's even a View activity, which can be used for read receipts.

Anyway, read 'em.

Actor Types

An Actor is what you might think of as a user agent. It can be a person, a bot, a company, a group of people, and more. But the important thing is: an Actor is the thing that is doing an Activity.

So a normal post to Mastodon is a Person actor doing a Create activity (what is being created is defined in the next section, Object types). Or if a company sent out discount codes to its followers, that would be an Organization doing an Offer activity.

There aren't many Actor types listed so just read them all.

Object Types

Yeah. I know. "Object" is probably the most overloaded English word in programming and we're already on our second thing called an object in this one spec. I'm sorry. Deep breaths, we'll get through this.

Anyhow. These are the meat and potatoes of what actually gets sent around these networks. Object types include:

...and several more. So when you make a regular text post to Mastodon, that is a case where a Person will Create a Note.

Putting it all together

Here's some code from a small ActivityPub reference implementation I wrote

  let noteMessage = {
    'id': `https://${domain}/m/${guidNote}`,
    'type': 'Note',
    'published': d.toISOString(),
    'attributedTo': `https://${domain}/u/${name}`,
    'content': text,
    'to': ['https://www.w3.org/ns/activitystreams#Public'],
  };

  let createMessage = {
    '@context': 'https://www.w3.org/ns/activitystreams',

    'id': `https://${domain}/m/${guidCreate}`,
    'type': 'Create',
    'actor': `https://${domain}/u/${name}`,
    'to': ['https://www.w3.org/ns/activitystreams#Public'],
    'cc': [follower],

    'object': noteMessage
  };

I construct the Note object and the Create activity. The Note object gets embedded inside the Create activity, which indicates that we are Creat-ing a Note. What about the Person? That's implied in the 'actor' field of the Create, and that URL is the equivalent of passing the Person by reference. Any program parsing this out could query that URL and get the JSON of the Person from there.

Properties

If you're wondering what those "id", "cc", and so on fields mean, you're in luck. Those are described in this section, which contains the other 50% of the stuff you care about.

Read this section, if only to familiarize yourself with what's here. That way you'll know to come back here when you see something like "prev" in a JSON object and wonder how that's supposed to be constructed.

Implementation Notes

This is where the concrete examples of things like "how to represent a friend request" live. Definitely worth reading this section as well, though on a first run through you really only need to read:

Reading ActivityPub

Next, read the ActivityPub spec. It's in seven major sections:

  1. Overview
  2. Conformance
  3. Objects
  4. Actors
  5. Collections
  6. Client to Server Interactions
  7. Server to Server Interactions

But first, a word about JSON-LD.

A note on JSON-LD

Starting with the ActivityPub spec, you're going to see JSON-LD mentioned a bunch. You don't have to care about this at all. For our purposes it's JSON with some specific property names in it, and where you follow links to complete the content of an object. Is this technically 100% correct? No. But if you were the kind of person who cared about the difference between JSON-LD and JSON, you'd probably be writing your own spec right now instead of reading this article.

One important thing to note about the kind of JSON stuff you'll be seeing is that you'll often see a URI embedded in a JSON object. Take the following example of an object with no URI anywhere in it:

{
  "id": 1234,
  "object": {
    "type": "Note",
    "content": "Hello world."
  }
}

Other times you'll see something like:

{
  "id": 1234,
  "object": "https://example.com/5678"
}

But then if you do a GET request with the appropriate headers to "https://example.com/5678", you'll get

{
  "type": "Note",
  "content": "Hello world."
}

These two JSON things comprise the exact same information as that first JSON thing. It's just split across two HTTP requests. When you encounter a URI in one of these JSON objects, the idea is to follow the URI in order to resolve it into plain JSON and consider that the sub-object.

Overview

Read this whole thing. You probably tried to read this once, before reading the Vocabulary spec, and if you're anything like me you were extremely confused. I promise it will be less confusing this time.

Basically it explains that you're taking these JSON things described in the Vocabulary spec and POSTing them to specific API endpoints, which process them.

Conformance

A short section that you might as well read. This is where it's made clear that this document actually describes two separate protocols. There is a Client to Server (C2S) and Server to Server (S2S).

In order to make your service "federated" via ActivityPub, the only protocol you really care about is the server to server protocol. S2S is how a Mastodon server can talk to a Pleroma server can talk to a Pixelfed server etc etc. Federation as we know it.

The C2S protocol is about user choice. It's a good thing, but it has nothing to do with federation. It's a standard way for user client software, like a phone app, to talk to a server. If we lived in a world where Mastodon and Pleroma and Pixelfed all used the C2S protocol, then in theory I could easily write a phone app that connects to all three of those services, and users could mix and match clients with servers to their heart's delight. A new service could pop up that I didn't even know about, but if it used C2S my phone app would be minimally compatible with it. (We do not live in this world, most federated services do not implement C2S, but this is the vision, to my understanding.)

Objects

This will be a much easier read now that you've read over the Vocabulary document.

Specifically, sections 3, 3.1, and 3.2 are really important and discuss why, for example, if you send a message to a remote server, it should have some way of querying your server to confirm the message. I originally thought that I could get away with just sending messages into the void and not storing them in a local database, but I was wrong, because remote servers need to confirm that it's not just some random person claiming to send content from my own origin.

Actors

So an ActivityPub actor is an Activity Vocabulary actor, with the mandatory addition of an inbox and and outbox, which are typically URIs that you either GET or POST to depending on what you want to do with them. For example, you can GET from a user's outbox URI and receive a JSON list of posts. Try it on my outbox from Friend Camp right now in your web browser by browsing to this URL. You should see something like this:

{
  "@context":"https://www.w3.org/ns/activitystreams",
  "id":"https://friend.camp/users/darius/outbox",
  "type":"OrderedCollection",
  "totalItems":5688,
  "first":"https://friend.camp/users/darius/outbox?page=true",
  "last":"https://friend.camp/users/darius/outbox?min_id=0&page=true"
}

If you click through to the "last" URI, you'll get 20 of my oldest posts from Friend Camp, including this charmer.

Anyway, read this whole section too. It's short.

Collections

Skip this for now. You'll absolutely want to come back to this later but you can probably figure out what collections are in context of their usage (hint: that outbox JSON is a type of collection; it contains paging data for iterating through collections of stuff).

But seriously just skip this for now, you don't need to muddy your brain with it.

Client to Server Interactions

Look. If you want to support a universal client ecosystem, go ahead and read this section. But if you just want to write a federated service, skip it and come back to it later.

Server to Server Interactions

Read this whole thing. This one section, combined with the Activity Vocabulary spec, forms the absolute basics of getting a federated service up and running. The Vocabulary spec tells you what to send, and this section tells you how to send it.

Among other things, it discusses the difference between an inbox and a sharedInbox. It talks about how to Follow an Actor, and how someone might Accept or Reject a follow request. It talks about Announce, which in addition to literally being for announcements, is also your standard "sharing" activity, like an RT on Twitter or a boost on Mastodon.

Reading ActivityStreams 2.0

Eh. Don't read it now. Skim the table of contents. There will be bits of ActivityPub that don't make sense without the ActivityStreams 2.0 spec, but those will be more advanced questions you'll have later.

THE ULTIMATE TL;DR

Go forth and make stuff

Assuming you've done all this reading, the knowledge you have right now should hopefully be enough to start hacking on sending simple messages via ActivityPub.

I recommend reading these two articles by Eugen Rochko, the creator of Mastodon:

If you're like me and you enjoy poking around at a simple implementation, I have written a bare bones JavaScript ActivityPub server using the popular Express application framework: