TL;DR
Skip to My Awful Implementation for what I did
Scroll to Three to read about what I think could be done with it
GROUNDWORK
What is GPT 3.5?
It’s a LLM created by OpenAI.
Aren’t those the ChatGPT guys?
Yeah.
What’s an LLM?
LLM = Large Language Model
What’s this GPT thing?
GPT = Generative Pre-trained Transformer. No, I’m not going to go into what a transformer is. Yes, that’s a Wikipedia link.
Is this Artificial Intelligence (AI)?
In a word: no.
In several words: I guess it depends on what you mean by Artificial Intelligence. Optics play a huge role in how humans interpret things. To our monkey brains it certainly walks and talks like a duck.
If it’s not AI, what is it?
Again, it’s a large language model. Here’s a breakdown of what it does:
Tokenize your sentence (it’s called a “prompt” in ML parlance)
Predict what should come next
What’s “tokenize?”
Getting WAY off track, here. It basically breaks up your input into smaller pieces. This helps the model determine context which in turn helps prediction.
If you’re REALLY interested, go look up a guy named Andrew Ng and absorb everything he says. Start with ML. Move to neural networks. Then go look up papers on arXiv to see what you can do with them.
What about DeepMind? Watson?
Actually, it’s interesting you bring up Watson. That was a veeeeery specialized model that took veeeeery specific input, banged it against a pre-trained neural network (NN), took the highest prediction score, spat out the results, then they massaged an answer.
It’s a different game in that I don’t think it’s generative (I could be wrong).
DeepMind, on the other hand, is a reinforcement learning model. Again, it bangs very specific input against a NN, spits out the results, throws the new state back into the NN, rinse and repeat until the loop is terminated by a win/loss state.
Different “game” (see what I did there?).
Why do you use AI in this article?
Because it’s a linguistic convenience.
ONE
KABOOM
ChatGPT came onto the scene and caused a hullabaloo. Punch in some text into the web form and get a really smart response that seems to draw upon all human knowledge up until September of 2021. It was also able to seemingly converse and have a memory of your prior interactions (hence the prefix “chat”).
It certainly wasn’t the first “chatbot” but it was definitely the coolest one. Indeed, it wasn’t even the first LLM. Ever used a translation app? Well, there you go. Those have been around for quite a while.
To be frank, I really don’t know what set it apart. Right place? Right time? Accessibility?
Doesn’t really matter. What matters is it exploded in popularity.
ADOPTION AND EXPLOITATION
OK so maybe it wasn’t a function of right place/time as much as it was putting into the hands of the public at zero cost.
Look, if you channel the collective force of Reddit and 4chan in to a specific task, they’ll rally and stream like a firehose at whatever target they’re pointed toward. This has the effect of exploiting something so thoroughly and holistically that whatever genius idea you’d had about something was the equivalent of a hallucination from a brain-damaged chihuahua.
They generated code with it. They generated pages-long essays that they turned in at university. They generated legal documents (hell, I’ve even done that). They even managed to prompt-engineer out of the carefully designed guard-rails as DAN (Do Anything Now) to create a persona that would make the cast of the Third Reich look like hippies in comparison.
When OpenAI created an API into this GPT world and you didn’t have to copy/paste from a web page, we hit an inflection point.
NOW WHAT?
The amount of tools and implementations had erupted like lava and ash from Mount Tambora. All of these neato things popped up that people were doing with it.
I experimented with some but the tools mostly sucked or were gimmicks. I wasn’t interested in jailbreaking it to have it feed me garbage; we have Alex Jones for that. I don’t code at all except for the occasional Jupyter Notebook; StackOverflow was more than enough to answer any of my forehead-slap-worthy stupid questions. It’s neat and fun that it can cook up Twitter posts but … who cares?
No way am I going to install a bunch of Chrome extensions or pay for specialized services that don’t add any value. Gimmicks, the lot of them. Developers jumping on the AI bandwagon to scrape— what, popularity? Exposure? A way to data mine?
Pass.
AGENTS
My first exposure to these was a web app called AgentGPT. While the original ChatGPT product didn’t have access to the Internet, somehow these devs were able to get it to do so to the point where you could connect some services, give it a real-world task, and it would run through all of the steps to make it happen and deliver you a finished product.
Whoa.
Give it the task of creating a web app that used GPT behind-the-scenes to write daily articles on trending news and it would do it. No, literally. It would purchase the domain name for you, write the HTML to the server, test its implementation, create the SQL database, scrape data from popular web pages, and invoke OpenAI’s API to vomit some really solid articles.
Give it the task of setting up an offshore bank account and you could literally end up with a shell business in Nevis ready for addition to the Paradise Papers leak.
Magic!
Of course, spend some time watching it attempt to accomplish work and you’ll find this implementation sorely, sorely lacking.
But the idea is good. This is what I’m looking for. This is a worthy of the AI hype. This can truly change the way we do work.
It wasn’t until I started to take apart how agents worked that I got my lightbulb moment.
TWO
SOMETHING USEFUL
First off, and what I think is the key to all of this: how the hell did they manage to get ChatGPT to access the wider Internet?
Short answer: they didn’t.
PIECES AND PARTS
GPT 3.5 Turbo
One of the many OpenAI products is a GPT LLM which is on iteration number 3.5 (with 4 arriving right around the corner) and I guess we have the “turbo” suffix because it … goes faster? Not even going to stop and look it up. Ultimately, these are “models” which are basically ANNs that have been tuned (more accurately “trained”) on a bunch of data and the resulting tensor “weights” (numbers, basically) are frozen, saved and viola you’ve got your model.1
One-shot and Memory
These models, themselves, are stateless. They’re one-shot. You throw in some garbage, you get out some garbage, and that’s it. It’s got no “memory” of before or after the prompt.
ChatGPT was a way to incorporate “memory” into your session. It accomplishes this by sending in aaaaaall of your prior inputs along with the new prompt. Each new prompt is basically everything you’ve written during your whole session sent back to the one-shot model.
Therefore, ChatGPT per se is not a model— it’s a webapp with the built-in functionality of sending back the session. If you want to make your own “chat” app using their API you’d have to replicate this yourself.2
Internet
Time to answer the original question: the GPT model was sent the raw HTML from the Internet in a prompt.
Behind the scenes in a completely external program, the agent works by combining a prompt with dynamic data, sending it to GPT, parsing the results, then sending those results in with another prompt.
Chaining
What I described above is called “chaining.” This is how I’m going to get something personally useful at an infinitesimally small scale by bastardizing agents. It’s the equivalent of taking apart a car so I can use the wire to hold up the blinds in my bedroom.
MY AWFUL IMPLEMENTATION
It took understanding all of the above for me to do my little project. I had a simple problem: my bookmarks are a hodge-podge of things I’ve saved during my years of of browsing. I have a mutated and crappy organization system where I half-ass give categories to things.
To reorganize all of this by hand in the UI of my bookmark manager suuuuuucks. It sucks a bowling ball through a hose pipe. It’s a waste of my time. At the same time, having well-organized bookmarks would help my productivity because it would be easier for data-retrieval and research.
Plus, if I actually make an app and it goes anywhere (which it won’t— ergo, it won’t), I can have a cool tagline on ProductHunt like, “AI Bookmark Manager” or something, ensuring that I, too, participate in the hype. YESSSS.
Long story short, here’s what I did:
Download my bookmarks as CSV
Send them in to GPT with a massaged prompt
What I ended up with were really nicely categorized bookmarks with re-written titles that I can convert from JSON to … whatever. The only code that I wrote was interacting with the OpenAI API using the Python SDK. I didn’t even have to manually do REST calls.
The prompt ended up being this:
Given the following chunk of CSV of a list of bookmarks,
extract the title, description, and URL.
If you are unable to determine the URL, leave the field blank.
Rewrite both the description and title to be more concise.
For each bookmark, determine a one-word category into which the bookmark should fall.
Add this field to output as category.
Emit extracted data fields as JSON. The JSON must be valid. Output ONLY JSON.
If there is any extra text, write it to a field called extra.
CSV chunk: {text}
To contrast, if I had to write this as a stand-alone program:
Read in a CSV
Go over each row
Use something to clean up the titles
Aggregate all of the titles to somehow come up with keywords
Clean up the keywords to come up with categories
… bunch more other stuff …
Ugh, I’m bored even writing it. No offense if you like programming and love this kind of low-level stuff but I fucking hate it.
THREE
And that’s the story of my lightbulb moment. I’m admittedly dumb and increasingly non-technical but this technology MUST have some cool application if we can squeeze it into the framework and limitations outlined above.
With GPT doing the heavy lifting, it boils down to prompt engineering. Sure, there’s a little coding here and there to do things like:
Read and write to file or DB
Download web pages
Send/receive to the GPT model
Access external APIs
SOLVE EVERYDAY PROBLEMS
There’s definitely some easy application here and you’re probably not thinking of it. In fact, you’re probably thinking (like most people),“Oh, hey, have ChatGPT think up the next $1mm idea for me.”
If you’ve read any of this article you’ll realize it doesn’t work like that. Hell— I, too, am a dreamer. Everyone’s got an idea, right? What differentiates those who think up the ideas, write them down, then go back to playing Diablo IV, is the execution and implementation.
While there is no magic to it, the barrier to implementation has been lowered by orders-of-magnitude. Even better, the cost of the technology is non-prohibitive.
Cheap hosting (even some free tiers)
Free/low-cost DDoS protection
Easy-to-admin edge servers
A wealth of freely-available APIs
Strong and vibrant open source
Ubiquitous data lakes
GIMMIE SOME IDEAS
At your respective jobs you’ve got problems. Let’s start there. Let’s start small and solve simple things. If it’s already automated, let’s put some AI stank on it. If it’s a matter of interpreting data, let’s see what GPT can come up with.
Massive over-simplification.
https://platform.openai.com/docs/guides/gpt/chat-completions-api