Run Your Own IPFS Search Engine With Lens

in #cryptocurrency5 years ago

Run Your Own IPFS Search Engine With Lens

Lens is another one of our open-source IPFS tools under the Temporal umbrella, allowing you to take content from IPFS, and index it to be searchable at a later date. Currently Lens can index the following mime-types:

  • text/*

  • image/*

  • application/pdf

The one requirement is that all your data exists on IPFS, and is discoverable by the running Lens instance. In the future we may add support for other distributed networks, such as DAT or SWARM. To interact with Lens we have a simple, but robust gRPC API that supports both simple and complex queries.

How Does Indexing Work

We have a few different methods of analyzing data that we’ll chain together. When given PDFs we first attempt to extract images and text from the pages. The text is fed into bleve which is capable of handling simple and complex search queries. The images are also analyzed, using a combination of Tesseract for optical character recognition to extract searchable text, and Tensorflow for rudimentary classification of images. When analyzing other mime types such as image/* we attempt to perform the same Tesseract, and image classification analysis as we do with images extracted from PDFs. When analyzing mime types like text/* we feed the text directly into bleve.

How Does Searching Work

Searching at the most basic level consists of taking a query, ranging from single words like blockchain all the way up to search phrases like blockchain data storage. We also support more complex queries, like filtering against specific tags, categories, mime types, and more however these are entirely optional.

The response to your query is an array of documents that contains the IPFS hash of the content that matched your query, as well as the mimetype of the content, and a score displaying the relevance this content has to your search query.

Installing Lens

There are a few different ways you can go about installing Lens, with the simplest way to be using our prebuilt Lens docker image. When using the docker image, the default setting is to start the gRPC server listening on 0.0.0.0:9998, without any encryption, and with a gRPC authentication key of blahblahblah. The docker container will also need a connection to an IPFS HTTP API, with the default being 127.0.0.1:5001. To install this docker image, run the following command docker pull rtradetech/lens:latest

Alternatively for those wanting a more hands off setup, we have a docker-compose setup that also spins up the required IPFS node. To use this docker-compose file, the following set of commands need to be run. These will use the /tmp directory as the base directory for storing all files in.

$> wget -O lens.yml https://raw.githubusercontent.com/RTradeLtd/Lens/master/lens.yml
$> LENS=latest BASE=/tmp docker-compose -f lens.yml up

Using Lens

Before we get started with how you can use Lens, we’ve published the existing Lens index as seen on https://temporal.cloud/lens via IPFS that can be downloaded via the CID QmZqSYDQrtWg4LHnqT6DPqa1XUr7u4oeaGcyaTiGHJY3SR. It’s 1.2GB in size and contains a variety of research papers, crypto whitepapers, and I have submitted, as well as other user submitted documents.

All Indexing and Searching can be done via the gRPC API, for which we have published protocol buffers on github. Using these you can build an API for Lens in any language that supports protocol buffers!

For an example of how we use those protocol buffers to build the Lens API client that is in Temporal, you can check out our Golang example below:


package clients

import (
    "fmt"

    "github.com/RTradeLtd/config/v2"
    "github.com/RTradeLtd/grpc/dialer"
    pb "github.com/RTradeLtd/grpc/lensv2"
    "google.golang.org/grpc"
    "google.golang.org/grpc/credentials"
)

const (
    defaultURL = "127.0.0.1:9998"
)

// LensClient is a lens client used to make requests to the Lens gRPC server
type LensClient struct {
    conn *grpc.ClientConn
    pb.LensV2Client
}

// NewLensClient is used to generate our lens client
func NewLensClient(opts config.Services) (*LensClient, error) {
    dialOpts := make([]grpc.DialOption, 0)
    if opts.Lens.TLS.CertPath != "" {
        creds, err := credentials.NewClientTLSFromFile(opts.Lens.TLS.CertPath, "")
        if err != nil {
            return nil, fmt.Errorf("could not load tls cert: %s", err)
        }
        dialOpts = append(dialOpts,
            grpc.WithTransportCredentials(creds),
            grpc.WithPerRPCCredentials(dialer.NewCredentials(opts.Lens.AuthKey, true)))
    } else {
        dialOpts = append(dialOpts,
            grpc.WithInsecure(),
            grpc.WithPerRPCCredentials(dialer.NewCredentials(opts.Lens.AuthKey, false)))
    }
    var url string
    if opts.Lens.URL == "" {
        url = defaultURL
    } else {
        url = opts.Lens.URL
    }

    conn, err := grpc.Dial(url, dialOpts...)
    if err != nil {
        return nil, err
    }
    return &LensClient{
        conn:         conn,
        LensV2Client: pb.NewLensV2Client(conn),
    }, nil
}

// Close shuts down the client's gRPC connection
func (l *LensClient) Close() { l.conn.Close() }

To actually index data, once you have your gRPC client up and running, all you need to do is called the Index command, and let Lens do its magic! Depending on where the content is in your network this process can take sometime. Generally speaking, if the content is locally available index analysis shouldn't ever take more than a minute, usually 30 seconds. When submitting data for indexing, you must provide two parameters, the ObjectType, which should be using the IndexReq_IPLDas defined in the protocol buffers. The second parameter is ObjectIdentifier which should be the IPFS hash of the content you want indexed.

Searching for data is extremely simple as well, and requires calling the Search command. The only required parameter is Query which defines how you want to search the data. Optionally you can filter out your search results even more with filters like Hashes to only match specific IPFS hashes, MimeTypes to only match specific mime types. The time it takes for this command to complete will depend on a wide variety of factors, such as the size of your index, the number of objects matched, the speed of your disk that the index resides on.

Thank you and a big shout out to everyone contributing to IPFS and all the great work that is be done by many different projects!

RTrade’s online community, Twitter or Telegram and website. Don’t forget to show Temporal some love on Github!

v2.1.0 of Temporal is out!
Highlights of release:

  • go-ipfs v0.4.20
  • ipfs-cluster v0.10.1
  • gomod support

Temporal: A versatile easy to use tool for companies with large amounts of data to secure, store and track. The platform can be used as is, or customarily built to manage and deploy blockchain-based applications and non-blockchain data-storage solutions for any enterprise.

Temporal Features:

If you don’t want to run your own Temporal installation you can use our hosted version, Full Featured Pinning Service w/ Free 3GB/Monthly, 5 Free IPNS record creation a month, 100 Free pubSub messages a month and 5 Free IPFS keys

Interface walk-through

Full Service IPFS API

Temporal-JS SDK Full public IPFS and IPNS usage

IPFS Gateway

I2P IPFS Gateway access

Installing your own Temporal

Also the Usages and Features section of the README.md doc on the GitHub repository covers using the docker compose file to spin up the environment.

Anything you build or use on our platform is NOT vendor locked-in. All software solutions currently available can be run in your own infrastructure simply by downloading our code off of github

Sort:  

Congratulations @rtrade! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You published more than 20 posts. Your next target is to reach 30 posts.

You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

To support your work, I also upvoted your post!

Do not miss the last post from @steemitboard:

SteemitBoard - Witness Update
Vote for @Steemitboard as a witness to get one more award and increased upvotes!

Congratulations @rtrade! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 1 year!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!