Intro
This is a server-side view and training tool for the forthcoming Action News feed for Bluesky. Its aim is to automatically aggregate as many posts as possible relating to (progressive/liberal/leftist, currently Americentric for urgency reasons) political or political-adjacent action while trying to keep the feed as free as possible of posts which are not that.
This tool is being trained to identify posts which clearly include useful information about one or more of the following:
- Action that is currently being taken
- Action that will be happening soon (e.g. an upcoming fundraising livestream for [charity])
- Shot-in-the-arm coverage of action that has recently been taken (e.g. an article about a big protest yesterday, or someone's photos of the big crowd turnout at that event)
- Positive outcomes of recent collective action (e.g. we raised $X thanks to you all, striking workers were able to negotiate better working conditions)
- Directly relevant general information about specific actions in the works (e.g. a link to personal safety and smarphone security guidelines for protests)
Ideal action posts are ones that are:
- Specific (e.g. not just generic "eat the rich" directives)
- Timely (here's something we can do right now, this weekend, etc.
- Collective/power of many in nature ("Here's what you can do" vs. "Here's what we're doing; come join us!" framing)
Types of action can include, but certainly aren't limited to:
- Protests, marches, acts of civil disobedience
- Fundraising events
- Volunteers needed
- Union related activites
- Boycotts and other consumer action
- Other sorts of calls to action, ideally with specific goals rather than just "write your senators"
Interface
Posts of potential interest which have been included in the current feed have a green tint; those which have been exluded have a red tint.
The Id can be referenced if you need to discuss something about a specific post.
The most important data in this view are the confidence values for each of the six possible classifications (aka tags) for this post (more on those below). The tag with the highest confidence is highlighted and represents the most-likely classification for this post. If you have an account and are logged in and you see a post whose most-likely classification is very much incorrect, you are able to click on a different tag label to automatically put that post into the learning queue so that the classifier will know that that's what that post should have had for its highest-confidence tag. The system will also add/remove the post from the feed as appropriate given that feedback. Future classifications will benefit from this learning.
Tags
There are 6 different tags at use in this system. Text used in training is assigned to exactly one of these 6 tags, but when new text is being classified for the feed, the classification is probabilistic, with each tag getting a number between 0.0 and 1.0 representing how confident the model feels about classifying the text as that particular tag. Basically, the tag with the highest confidence is the one of most interest and relevance.
The tags in question and their meanings are:
- action: This post should be in the feed as it meets the aforementioned criteria.
- action-noinfo: This post should not be in the feed as it speaks about the kind of action we're interested in, but does so in a vague, rhethorical, first-person rather than collective focus, lacking in details, meta, or otherwise lacking a "here's a specific thing we can do" essence. Action but with nothing actionable.
- action-notpoli: This post should not be in the feed as it contains the action elements we're looking for, but it isn't really about sociopolitical-type issues. For instance, a gofundme for a specific individual, volunteers needed for a community garden, animal shelter events, etc.
- poli-notusa: This post should not be in the feed as it's about politics but outside the scope of American politics. This is tricky because everyone's politics involve America's politics right now. I'll come back to that. Basically, if France is passing a law about what hues of grape are and aren't allowed in what French wine, that's poli-notusa.
- poli-notaction: This post should not be in the feed as it is about politics but doesn't involve any action of the sort we're looking for. People reporting that such-and-such bill just passed congress, or venting about something stupid such-and-such mayor did, or an analysis of a particular court ruling, that's poli-notaction.
- misc: This post should not be in the feed as it is neither action nor sociopolitical in nature.
Figuring out what the tags should be
Tagging is a little unintuitive with this setup, as 5 of the tags are there solely to help better clarify what the action tag isn't. While action is quite particular, we can play a little fast and loose with the other tags. Very frequently a post could arguably go in either of 2 categories. For example, a post about passing a new tax law in Greece could be either poli-notaction or poli-notusa. As long as the post's highest-confidence tag is one of those two, that's all that matters. A good way to look at this is that of the non-inclusive tags, it's not about making sure they're all right per se, merely finding any that might be clearly wrong. As with the other system, if there's any uncertainty at all, just leave it be because it would likely just act as poor training data.
I'm having some difficulty coming up with good guidance on the poli-notusa tag. For example, there are very Canadian Canada political discussions right now involving Elon. There are very English England political discussions right now involving Trump. One could make an argument for poli-notusa since they're about actions of the people in those countries and their governments. But action might also be argued for because it's our friends in other countries in the same boat as us engaged in similar activites. Or it might be poli-notaction if it doesn't meet action criteria. My approach so far has been to mostly leave these posts' tags alone unless I see something about a particular post that really jumps up and down screaming for a specific category. But that's been exceedingly rare so far.
Sometimes you'll see a post whose highest confidence tag is action but it's marked red and thus not in the feed. What this means is that that classification didn't pass the minimum confidence threshold, which is currently set at 0.7. In these situations, if the post has clear "yeah this is very obviously an action post" vibes, you can explicitly put it in the learning queue as action to ensure it goes into the feed and to feed it into the learning. But if it's definitey action but the wording feels a little tenative or otherwise doesn't fully knock it out of the park, it's best to just leave the confidence as-is.
Every so often there will be posts regarding conservative fundraiers or whatever and the classifier doesn't know any better yet and classifies them as action. What I've generally been doing with them is putting these in the learning queue as poli-notaction.
Lastly, there are many posts that talk about community or rallying or volunteering or any of the other keywords we care about, but the text of that post in isolation offers no clue whatever about whether this is a Target boycott or a video game raid. In those cases, it will be impossible for the classifier to make any sort of meaningful classification with that text, so just mark it as misc.