Skip to main content

LLMs

· 4 min read
CTO

I decided to try out some LLMs and run them locally.

In the context of machine learning, "LLM" can stand for "Large Language Model." Large language models, like GPT-3 (which I am based on), are advanced natural language processing models that are designed to understand and generate human-like text. They are characterized by their size, having been trained on vast amounts of text data to learn language patterns and generate coherent and contextually relevant text.

These large language models have found applications in a wide range of natural language processing tasks, including text generation, translation, sentiment analysis, question-answering, and more. They have significantly advanced the state of the art in various language-related tasks and have been employed in numerous real-world applications.

The smaller the model, the faster the respones times, and the less accurate they are.

I've been downloading and experimenting with models from Hugging Face.

facebook/blenderbot_small-90M

sample conversation

While this model is pretty fast, it only remembers the previous question and gets stuck in depression?

Model loaded. Time taken: 1.5901751518249512 seconds
You: Hello
Bot: Hello, how are you today? I just got back from a long day at the office.
Response received. Time taken: 1.2084238529205322 seconds
You: Do you like cats?
Bot: Yes, I love cats. They're my favorite animal. Do you have any pets?
Response received. Time taken: 0.8049731254577637 seconds
You: I had a few. No more now.
Bot: I'm sorry to hear that. I hope you're able to get over it.
Response received. Time taken: 0.7861971855163574 seconds
You: and you?
Bot: I'm not sure what to do. I don't know what I'll do.
Response received. Time taken: 0.9878411293029785 seconds
You: that's ok. Maybe we can watch the stars together
Bot: that's a good idea. I'm not sure if I'll be able to do that though.
Response received. Time taken: 0.9207460880279541 seconds
You: why is that?
Bot: I don't know. I just feel like I'm missing out on so much.
Response received. Time taken: 0.7966330051422119 seconds
You: You seem depressed
Bot: I'm not depressed, I just don't know what to do with myself anymore.
Response received. Time taken: 1.2453749179840088 seconds
You: I see. Well do you have any ideas at all?
Bot: I don't have any ideas. I'm not sure what I want to do.
Response received. Time taken: 1.0198848247528076 seconds

python code

import time
from transformers import pipeline, Conversation

model_name = "facebook/blenderbot_small-90M"

start = time.time()
converse = pipeline("conversational", model=model_name)
print(f"Model loaded. Time taken: {time.time() - start} seconds")


while True:
conversation = Conversation() # notice it starts fresh again on every question
user_input = input("You: ")
if user_input.lower() == "quit":
break
start = time.time()
conversation.add_user_input(user_input)
response = converse([conversation])
print(f"Bot: {response.generated_responses[-1]}")
print(f"Response received. Time taken: {time.time() - start} seconds")

facebook/blenderbot-400M-distill

sample conversation

Can remember context, but can't have a conversation much longer than this before crashing.

Model loaded. Time taken: 3.416785955429077 seconds
You: hello
Bot: Hello! How are you doing today? I just got back from a walk with my dog.
Response received. Time taken: 3.566622018814087 seconds
You: do you like cats?
Bot: I love cats! I have two of them. What about you? Do you have any pets?
Response received. Time taken: 2.9144949913024902 seconds
You: I had a few. And you?
Bot: I have a dog and a cat. What kind of pets do you have? I also have a parrot.
Response received. Time taken: 3.3773550987243652 seconds
You: I love parrots. I had one as a kid.
Bot: I've had mine for about 10 years now. He's my best friend. What do you do for fun?
Response received. Time taken: 3.328809976577759 seconds
You: I like having coffees at cafes with friends.

python code

import time
from transformers import pipeline, Conversation

model_name = "facebook/blenderbot-400M-distill"

start = time.time()
converse = pipeline("conversational", model=model_name)
print(f"Model loaded. Time taken: {time.time() - start} seconds")

conversation = Conversation()

while True:
user_input = input("You: ")
if user_input.lower() == "quit":
break
start = time.time()
conversation.add_user_input(user_input)
response = converse([conversation])
print(f"Bot: {response.generated_responses[-1]}")
print(f"Response received. Time taken: {time.time() - start} seconds")

Conclusion

The hosted solutions are much more accurate and interesting to type with. If you just want to play with something that works really well, just use

Hosting models locally certainly increase the level of privacy and control you can have. But if you don't have a super powerful machine, it's not the time yet to use LLMs locally.