A colleague told me about Ollama, which allows you to get LLMs working on a local machine. I was so excited about this that I downloaded the orca-mini model. Due to terrible hotel wifi I used my mobile internet and blew out the limit on that. Oops.
Anyway, it is very easy to get Ollama working. Just download and install the software, then run ollama run llama2
. It has a simple REST interface:
curl -X POST http://localhost:11434/api/generate -d '{
"model": "orca-mini",
"prompt":"tell me a joke"
}'
It was easy enough to get this working with LangChain4J, although the APIs were not quite the same as for the OpenAPI models.
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.output.Response;
public class App
{
public static void main( String[] args )
{
String modelName = "orca-mini";
String localUrl = "http://localhost:11434/";
ChatLanguageModel model =
OllamaChatModel.builder().baseUrl(localUrl).modelName(modelName).build();
String answer = model.generate("tell me a joke about space");
System.out.println("Reply\n" + answer);
}
While these local models are less powerful than OpenAPI they seem fairly decent on a first examination. They also a much cheaper way to work with an LLM and I am going to use this to set up a simple RAG (retrieval augmented generation) example in LangChain4J.