Have you ever played a guessing game?

Have you ever played a guessing game against someone who is really good at guessing?

It's probably a good time to talk about this, in particular given the ever more forceful drive for technology to solve all human woes.

If you think about it, you probably hear those two little letters - A.I. - bouncing around more at the moment than anything else. For some it's the joy that robots will fix everything and finally take the responsibility from our hands so we can eat snacks and binge TV. For others it's the impending sense that Skynet is about to become self-aware.

In reality, we're just building better guessing games these days and, increasingly, the Wizard of Oz can rely on nobody pulling back the curtain because people keep moving the curtain further away from the average citizen of the Emerald City.

To give you a really good example, let's have a look at GPT-2, an artificial language model built on machine learning which can generate text. Is it magic? No.

To give you the short version - and one which is understandable to humans - a program was designed to absorb Reddit and what Reddit users upvote, turn all the words it took in into numeric tokens, and spit out what it thinks is most likely to be the next word in the sentence if you give it a starter for ten.

It's a big program. A complicated one. But it's not sentient.

There's an old expression used by analysts the world over: "If you put sh*t in, you get sh*t out," and that absolutely applies here.

There are about 52 million daily active Reddit users worldwide, and more than half of its traffic comes from the US. It's only actually the seventh most popular social media platform in America, according to recent stats. Content is heavily weighted towards to conversations within replies, with just over 303 million actual posts in 2020 generating some 2 billion comments.

There's a good chance you don't even know what Reddit is, in reality, and it's heavily skewed towards Americanized internet sub-culture. It's not representative of humanity as whole, in short, and subsequently layer upon layer of bias skews the performance of text generator.

It does reflect the human race in one way though, in that it doesn't have imagination beyond the constraints of its cultural points of reference. Its lack of exposure to diversity - and the learned behavioural impact of that - is perhaps the best way it should be used as a teaching tool.

It is, nonetheless a good guessing game.

Though we don't have a use case for it currently (it doesn't serve a purpose in any of the analytical products we design or operate), we wanted to show you what can be done with it and why it's important sometimes for someone to play Toto.

We used the instructions in this article to implement GPT-2 in just 8 lines of code, tweaked the settings a bit, and then fed it a few sentences of consolidated Q ramblings from a research project we are currently running, away from the over-monitored social media platforms where very little is happening. We asked it to give us back its best few paragraphs.

(For the sake of clarity, because lots of articles say they have when they haven't: we haven't trained the model, it's been trained and that's what we are interested in. Some articles seem to have confused training the model with providing input text or finetuning the settings, and that's not the same thing.)

The sentences we gave the machine were:

"The Trump campaign is dropping a truth bomb to stop the steal. Department of Justice is executing warrants on the Obama compound. Hunter Biden implicated in China fraud. Clintons sex trafficking, save the children! In God We Trust. The COVID vaccine is eugenics, they are killing us."

Try after try, after try, what the machine gave us back was expert level "If I Were A Reddit User" computerised guess:

  1. It is true, that the United States of America has not been 'anti-science', as this is not at all the point where he claimed that such things were 'anti-science' at all. But this is not at all what many people who've attended the rallies or are watching at home say – these are 'common sense things'. In fact if you look at Hillary's website you can see that the only other link to the conspiracy theories is the word 'climate change'. When you look at her website here are pictures of both her and Obama as well as a short YouTube video that's linked under "climate change": this video, called a "crippling YouTube earthquake", shows the earthquake and aftermath of "Climate Change". You could easily watch it all.
  2. I want to note that the number two item with all this controversy is also quite misleading. And at this point it doesn't matter what the press is talking about: they've all jumped the gun.
  3. First let's go into how Obama works. He does things like run his country like a dog. He even gets out of the closet and speaks English like everyone else except China. But after being forced to resign under the House of Representatives Obama continues to be Secretary of Defense.
  4. Obama's friends claim: "He is a good boy! He is great at things! People in Africa, Iran and Russia will appreciate you. He is extremely knowledgeable! Obama does it like a girl will do it!" Actually Obama hasn't done anything like that. He's just been a little bit more humble and kind and gentle.
  5. When China is having a "super storm". It's the worst that has ever happened in American history, and its only chance at survival is not to make a major power collapse and take on the United States, or even the rest of the civilized world, with it. That's what the president is doing with this disaster, which means the United States – and China – will be the first to take over the world in just 20 or 30 years.

It's not a million miles away from a Speak and Spell, in that it accepts an input and approximates what an output should be, based on what it has been taught to do.

To variate, we picked a random sentence from today's Guardian live blog. This is a great example of what happens when you move a step farther back from the internet sub-culture mirror which is at the core of the system.

The sentences we gave the machine were:

"More than 60,000 fans will be allowed to attend the Euro 2020 semi-finals and final at Wembley, the government has announced. That will take attendance to seventy-five percent of stadium capacity. This means these matches, on 6, 7 and 11 July, will attract the largest crowds for a sporting event in the UK in more than 15 months. Announcing the decision, the culture department said: All ticket holders will need to follow a number of strict entry requirements including having a negative Covid-19 test or proof of full vaccination - two doses received."

As the old saying goes, when you take a fish out of water it can't ride a bicycle:

  1. All ticket holders will need to follow a number of strict entry requirements including having a negative Covid-19 test or proof of full vaccination - two doses received while taking a course of anti-retroviral medication to prevent diarrhoea-related illness.
  2. All visitors also must bring both medical and home-grown cards with them to join the celebrations, it added. "The new measures help to minimise disruption at the ground," added the Culture Department. "To accommodate an increasing number of visitors who want a taste of the Euros and other events, the authorities created new ways of ensuring extra security as well." "Covid is a great value for travelling and it will bring home the passion of Londoners for the Olympics."
  3. Rugby League spokesman Craig Pugh said: "It seems that by playing this event in front of a community or audience in the UK, it can be a very special experience and will hopefully allow us to create a sense of connection with our players and fans that will give them the feeling that at home there are always teams home to prove themselves." Football has always been played in the London Borough of Camden which is our heritage."
  4. The Games will take place in English National Park, Essex on 6 and 7 July. An announcement of the venue's future will be made after discussions on Monday.

We are a long way from Skynet becoming self-aware.

As an opportunity to create innovative learning, this technology really does hold very tangible potential to show a whole generation of children why cultural diversity is important - and how a lack of it can constrain your ability to think critically, communicate effectively, or explore the word with an open and inquisitive mind.

Those kids, the ones that we teach this lesson, they'll make the real magic long after we're gone.


Try It Out:

For those interested in trying it out, you need to install torch and transformers and the below code will do the job so you can experiment:

import torch
import transformers
from transformers import GPT2LMHeadModel, GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
sequence = ("**your seed text**")
inputs = tokenizer.encode(sequence, return_tensors='pt')
outputs = model.generate(inputs, max_length=1024, do_sample=True, temperature=1.1, top_k=50)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)

