To implement Retrieval-Augmented Generation (RAG) using Go, PostgreSQL, and Pinecone, you need to combine a vector database (Pinecone) for storing and retrieving embeddings with a language model for generating responses. Below is a step-by-step guide to achieve this. Note that while Go is your primary language, some components (like embedding generation) may require integration with external APIs or libraries, potentially using Python for convenience, as Pinecone and RAG workflows are more commonly supported in Python-based frameworks like LangChain.

Prerequisites

  • Go: Ensure you have Go installed and configured.
  • PostgreSQL: Your data is already stored here, with the pgvector extension for vector operations (if needed for hybrid approaches).
  • Pinecone Account: Sign up at pinecone.io, obtain an API key, and set up an index.
  • Embedding Model: Access to an embedding model (e.g., OpenAI, Hugging Face, or Amazon Bedrock) to convert text into vectors.
  • LLM: Access to a large language model (e.g., OpenAI, Llama, or Gemini) for response generation.

Step-by-Step Implementation

1. Set Up Pinecone

  • Create a Pinecone Index:
    • Log in to Pinecone and create a new index (e.g., rag-index).
    • Set the dimension based on your embedding model (e.g., 1536 for OpenAI’s text-embedding-ada-002).
    • Choose a similarity metric (e.g., cosine for most text embeddings).
    • Use the serverless option for simplicity unless you need specific pod configurations.
  • Obtain API Key: Note your Pinecone API key and environment (e.g., us-east-1).

2. Extract and Preprocess Data from PostgreSQL

  • Query Data: Use a Go PostgreSQL driver like jackc/pgx to fetch relevant text data from your PostgreSQL database.
    package main
    
    import (
        "context"
        "github.com/jackc/pgx/v5"
        "log"
    )
    
    func fetchData() ([]string, error) {
        conn, err := pgx.Connect(context.Background(), "postgresql://user:pass@localhost:5432/dbname")
        if err != nil {
            return nil, err
        }
        defer conn.Close(context.Background())
    
        rows, err := conn.Query(context.Background(), "SELECT content FROM documents")
        if err != nil {
            return nil, err
        }
        defer rows.Close()
    
        var contents []string
        for rows.Next() {
            var content string
            if err := rows.Scan(&content); err != nil {
                return nil, err
            }
            contents = append(contents, content)
        }
        return contents, rows.Err()
    }
    
  • Chunk Data: Split large text into smaller chunks (e.g., 500-1000 characters) to fit within embedding model limits and improve retrieval relevance. You can implement a simple chunking function in Go:
    func chunkText(text string, maxLen int) []string {
        var chunks []string
        for len(text) > maxLen {
            chunks = append(chunks, text[:maxLen])
            text = text[maxLen:]
        }
        if len(text) > 0 {
            chunks = append(chunks, text)
        }
        return chunks
    }
    

3. Generate Embeddings

  • Choose an Embedding Model: Use an external service like OpenAI or Hugging Face to generate embeddings. Since Go doesn’t have native support for most embedding models, you can:
    • Call an API: Use HTTP requests to interact with an embedding API.
    • Use Python for Embedding: Run a Python script to generate embeddings and store them in Pinecone, then integrate with your Go application.
  • Example with OpenAI API (via HTTP in Go):
    package main
    
    import (
        "bytes"
        "encoding/json"
        "net/http"
    )
    
    type EmbeddingRequest struct {
        Input string `json:"input"`
        Model string `json:"model"`
    }
    
    type EmbeddingResponse struct {
        Data []struct {
            Embedding []float32 `json:"embedding"`
        } `json:"data"`
    }
    
    func getEmbedding(text string, apiKey string) ([]float32, error) {
        reqBody, _ := json.Marshal(EmbeddingRequest{
            Input: text,
            Model: "text-embedding-ada-002",
        })
        req, _ := http.NewRequest("POST", "https://api.openai.com/v1/embeddings", bytes.NewBuffer(reqBody))
        req.Header.Set("Authorization", "Bearer "+apiKey)
        req.Header.Set("Content-Type", "application/json")
    
        client := &http.Client{}
        resp, err := client.Do(req)
        if err != nil {
            return nil, err
        }
        defer resp.Body.Close()
    
        var embResp EmbeddingResponse
        if err := json.NewDecoder(resp.Body).Decode(&embResp); err != nil {
            return nil, err
        }
        return embResp.Data[0].Embedding, nil
    }
    
  • Batch Process: For each chunk, generate an embedding and store it with metadata (e.g., document ID, chunk ID).

4. Store Embeddings in Pinecone

  • Upsert to Pinecone: Use Pinecone’s Go SDK (pinecone-io/go-pinecone) to upsert embeddings. If the SDK is unavailable or limited, you can use HTTP requests to Pinecone’s gRPC-based API.
    package main
    
    import (
        "bytes"
        "encoding/json"
        "net/http"
    )
    
    type UpsertRequest struct {
        Vectors []struct {
            ID     string    `json:"id"`
            Values []float32 `json:"values"`
            Metadata map[string]interface{} `json:"metadata"`
        } `json:"vectors"`
        Namespace string `json:"namespace"`
    }
    
    func upsertToPinecone(vectors []struct {
        ID     string
        Values []float32
        Metadata map[string]interface{}
    }, apiKey, indexName, namespace string) error {
        reqBody, _ := json.Marshal(UpsertRequest{
            Vectors:   vectors,
            Namespace: namespace,
        })
        req, _ := http.NewRequest("POST", "https://"+indexName+".svc.pinecone.io/vectors/upsert", bytes.NewBuffer(reqBody))
        req.Header.Set("Api-Key", apiKey)
        req.Header.Set("Content-Type", "application/json")
    
        client := &http.Client{}
        resp, err := client.Do(req)
        if err != nil {
            return err
        }
        defer resp.Body.Close()
        return nil
    }
    
  • Organize Data: Use namespaces in Pinecone to separate different datasets or tenants (e.g., golang-docs). Store metadata like the original text or document ID for retrieval.

5. Retrieve Relevant Chunks

  • Query Embedding: Convert the user’s query into an embedding using the same embedding model.
  • Similarity Search: Query Pinecone to find the top-N most similar vectors.
    type QueryRequest struct {
        Vector         []float32 `json:"vector"`
        TopK           int       `json:"topK"`
        Namespace      string    `json:"namespace"`
        IncludeMetadata bool      `json:"includeMetadata"`
    }
    
    type QueryResponse struct {
        Matches []struct {
            ID       string                 `json:"id"`
            Score    float32                `json:"score"`
            Metadata map[string]interface{} `json:"metadata"`
        } `json:"matches"`
    }
    
    func queryPinecone(queryVector []float32, apiKey, indexName, namespace string, topK int) ([]struct {
        ID       string
        Score    float32
        Metadata map[string]interface{}
    }, error) {
        reqBody, _ := json.Marshal(QueryRequest{
            Vector:         queryVector,
            TopK:           topK,
            Namespace:      namespace,
            IncludeMetadata: true,
        })
        req, _ := http.NewRequest("POST", "https://"+indexName+".svc.pinecone.io/vectors/query", bytes.NewBuffer(reqBody))
        req.Header.Set("Api-Key", apiKey)
        req.Header.Set("Content-Type", "application/json")
    
        client := &http.Client{}
        resp, err := client.Do(req)
        if err != nil {
            return nil, err
        }
        defer resp.Body.Close()
    
        var queryResp QueryResponse
        if err := json.NewDecoder(resp.Body).Decode(&queryResp); err != nil {
            return nil, err
        }
        return queryResp.Matches, nil
    }
    

6. Generate Response with LLM

  • Prepare Context: Combine the retrieved chunks’ text (from Pinecone metadata or PostgreSQL) with the user’s query.
  • Call LLM: Use an LLM API (e.g., OpenAI, Llama via Ollama, or Gemini) to generate a response. Example with OpenAI:
    type ChatRequest struct {
        Model    string    `json:"model"`
        Messages []Message `json:"messages"`
    }
    
    type Message struct {
        Role    string `json:"role"`
        Content string `json:"content"`
    }
    
    type ChatResponse struct {
        Choices []struct {
            Message Message `json:"message"`
        } `json:"choices"`
    }
    
    func generateResponse(context, query, apiKey string) (string, error) {
        prompt := "Context: " + context + "\n\nQuery: " + query + "\nAnswer:"
        reqBody, _ := json.Marshal(ChatRequest{
            Model: "gpt-4",
            Messages: []Message{
                {Role: "user", Content: prompt},
            },
        })
        req, _ := http.NewRequest("POST", "https://api.openai.com/v1/chat/completions", bytes.NewBuffer(reqBody))
        req.Header.Set("Authorization", "Bearer "+apiKey)
        req.Header.Set("Content-Type", "application/json")
    
        client := &http.Client{}
        resp, err := client.Do(req)
        if err != nil {
            return "", err
        }
        defer resp.Body.Close()
    
        var chatResp ChatResponse
        if err := json.NewDecoder(resp.Body).Decode(&chatResp); err != nil {
            return "", err
        }
        return chatResp.Choices[0].Message.Content, nil
    }
    

7. Integrate into a RAG Workflow

  • Main Function: Combine the steps into a cohesive pipeline.
    func main() {
        pineconeAPIKey := "your-pinecone-api-key"
        openaiAPIKey := "your-openai-api-key"
        indexName := "rag-index"
        namespace := "golang-docs"
    
        // Fetch and chunk data
        contents, err := fetchData()
        if err != nil {
            log.Fatal(err)
        }
        var vectors []struct {
            ID       string
            Values   []float32
            Metadata map[string]interface{}
        }
        for i, content := range contents {
            chunks := chunkText(content, 500)
            for j, chunk := range chunks {
                embedding, err := getEmbedding(chunk, openaiAPIKey)
                if err != nil {
                    log.Fatal(err)
                }
                vectors = append(vectors, struct {
                    ID       string
                    Values   []float32
                    Metadata map[string]interface{}
                }{
                    ID:     fmt.Sprintf("doc%d-chunk%d", i, j),
                    Values: embedding,
                    Metadata: map[string]interface{}{
                        "text": chunk,
                    },
                })
            }
        }
    
        // Upsert to Pinecone
        if err := upsertToPinecone(vectors, pineconeAPIKey, indexName, namespace); err != nil {
            log.Fatal(err)
        }
    
        // Example query
        query := "How to use PostgreSQL with Go?"
        queryEmbedding, err := getEmbedding(query, openaiAPIKey)
        if err != nil {
            log.Fatal(err)
        }
        matches, err := queryPinecone(queryEmbedding, pineconeAPIKey, indexName, namespace, 5)
        if err != nil {
            log.Fatal(err)
        }
    
        // Combine context
        var context strings.Builder
        for _, match := range matches {
            context.WriteString(match.Metadata["text"].(string) + "\n")
        }
    
        // Generate response
        response, err := generateResponse(context.String(), query, openaiAPIKey)
        if err != nil {
            log.Fatal(err)
        }
        fmt.Println("Response:", response)
    }
    

8. Optimize and Scale

  • Hybrid Search: Combine Pinecone’s vector search with PostgreSQL’s keyword search for better relevance. Use pgvector for local vector storage if latency or cost is a concern.
  • Caching: Cache frequent queries in PostgreSQL or an in-memory store like Redis.
  • Batching: Upsert embeddings in batches to reduce API calls to Pinecone.
  • Guardrails: Add logic to filter irrelevant retrieved chunks before passing to the LLM.

Notes

  • Python for Simplicity: If embedding generation or Pinecone integration becomes complex, consider using Python with LangChain for the indexing and retrieval steps, then expose an API for your Go application to consume. LangChain supports Pinecone and simplifies RAG workflows.
  • Local LLM: For cost savings, use a local LLM like Llama3 via Ollama, as shown in some Go-based RAG demos.
  • Pinecone Advantages: Pinecone’s serverless architecture scales well and reduces infrastructure management compared to self-hosted vector databases.
  • Error Handling: Add robust error handling and logging in production code.
  • Security: Secure API keys using environment variables or a secrets manager.

Resources

This approach leverages Go for data extraction and API interactions, Pinecone for scalable vector storage, and an external LLM for generation, creating a robust RAG system tailored to your tech stack.

Comments