Import all the things!
Last year Elon acquired Twitter and began running it into the ground, so I imported my tweets here & stopped posting there. A few months later I also imported all the posts from my Mastodon account. Here are a couple of examples:
This is a static website, and the static site generator I use is home-grown. So in both cases I wrote an importer from scratch. I really half-assed it, to be honest. Neither importer handles media, so any photos in my posts are missing. I told myself I’d get back to that but I have yet to.
The other day I decided to have a go at importing my Bluesky and Threads posts as well. I started with Threads. I’ve only posted there a couple dozen times, so I figured it wouldn’t be that hard.
It turns out you can’t even download your Threads archive without also downloading your Instagram archive, and it was very easy on the same page to also request an archive of my Facebook posts. The archives of all three Meta services are very similar, which I suppose shouldn’t be all that much of a surprise. So instead of writing a Threads importer, I wrote an importer that handles all three archives. It handles media too. Here are some examples:
- My first Facebook post is apparently just a link to my Twitter account.
- My first Instagram post is a shot of my father-out-law’s old homebrewing setup.
- My first Thread is literally just the 💯 emoji.
Having written several importers like this over the last year, here are some disconnected observations about these services’ archive formats:
- Their archive format isn’t technically JSON; it’s a JavaScript file you’re expected to
eval()
. 🤮 - The data for a tweet doesn’t contain a URL to the tweet on Twitter’s website. It does contain the tweet ID, and you know what account tweeted it, so you’re able to construct a URL for it, but the format itself doesn’t contain the URL.
- Twitter didn’t originally have native support for at-mentions, hashtags, or other such syntax, so they had to kind of bolt on support for formatted tweet content, and it really shows. It’s super awkward to code for.
- The per-tweet field called
full_text
, well, isn’t. It’s often truncated.
Meta
- All of their archives are JSON but all the strings in them are incorrectly encoded. A quick search will lead you to a snippet of code to correct this in your preferred programming language.
- None of their archives contain URLs to the actual posts, nor do they appear to contain enough information to reconstruct URLs. So if you re-publish a post on your own site, there’s no way to link from your copy of the post back to the original. It gives you the impression that these services aren’t really part of the web at all.
- Speaking of things missing from Meta’s archive format, it doesn’t seem to preserve or contain any kind of threading information. So, in Threads’ archive, an archive for a service literally named after threading posts together, you can’t tell that a post is a reply, much less what it’s in reply to. It’s really frustrating.
Mastodon
- Unlike Twitter and Meta, Mastodon’s archive format feels like it was designed by web nerds for web nerds. It’s super-easy to generate a high-fidelity version of a Mastodon post on your own site.