Day 3 of Project Apario's Telehackathon - Agenda

During tomorrow's broadcast we are going to enhance Project Apario's search by permitting you to run multiple search queries together and utilizing intersection/subtraction and addition in order to find new content to stumble into.

The driving force behind the design of Project Apario is predicated on the vast majority of the user base who will use Project Apario will not know anything specific to search right off the bat, which means that we have some low hanging fruit:

I have search results and search histories and on the home page below the search results I would like to introduce 3 columns that would represent 6 lines of text each that would give you, in the first column, a "tag wall" of searches (top 30) and respective to their total page identifiers; this means that we would basically be able to present to you popular keywords that have a lot of results in them.

The second column would be recent searches that provide insight into what was just searched for on Project Apario. Results in this list would be completed searches and this would give people the ability of discovering popular searches that are taking place currently.

The third column would be a list would actually just present, from a pure text perspective, the filenames and page numbers of 6 random pages within Project Apario; kind of like StumbleInto, but from text only at the home page and effectively give you additional access to viewing and discovering records within the system.


Another set of functionality that is critical at this point to work on is the following:

  1. When you access a search result page, when a result is loaded to the screen, modify the URL without performing a redirection such that when a user copy's the URL they are actually getting a unique URL into that specific page that they are seeing in the search result. This is big big big since it will play a factor in the social media outreach component of finding things in the system.

  2. In the results of the search, a button that provides a QR Code should be created that performs a lookup dynamically of the QR Code so it can be shared and/or printed because if you have a really good document that you see, capture that QR Code and get involved with it and learn with it.

  3. The sidebar of search results is very simple at the moment and will be changed. I am thinking of introducing 3 tabs of information that would effectively be responsible for displaying in:

Tab 1: Metadata, tags, and snippets (title: Metadata)
Tab 2: Contributions (title: Contributions)
Tab 3: Full Text Results (title: Text)
Tab 4: QR Code (Icon only)


During tomorrow's telehackathon, you are encouraged to watch and observe what is happening behind the scenes. You are being granted a window into unique type of transparency into my workflow process and how I create software.

Tomorrow's agenda will include completing intersection search results so we can do queries like:

top secret and kennedy (intersection)
or
top secret or confidential (addition)
or
top secret not confidential (subtraction)

These kind of results need to be coupled with the following:

What happens when a search keyword is presented and we don't have a hot search that was recently performed to reflect against? Well, we need to kick off that search job and utilize the browser to send notifications of those results.

Currently there is low hanging fruit that can be harvested from Project Apario's backend worker that reduce the overall memory utilization of the system by reducing the volume of variables being utilized inside of loops and also reducing the volume of web socket push notifications being sent to the notification system.

Currently when search is being performed, as results are processed on a per page basis, the page is then pushed into the result accordingly; instead what I think we should consider doing is the following:

  1. The size of the loop should be reduced from 99,999 results down to 999 results. This significantly reduces the memory footprint but ultimately contributes to overall total number of batch loops that are processed. We go from performing 7 loops up to 612 loops; BUT we're going from potentially sending literally TENS OF THOUSANDS of push notifications to your device or computer down to sending just 612.

  2. When the loop begins the results subset are collected and then processed accordingly; the results should be stored in an OpenStruct again.

  3. Ultimately what this will mean is that results will come into your browser slightly slower than before, but shouldn't be too noticeable, but from the vantage point of how much memory your browser uses... during today's livestream we did a search for "assassin" on Project Apario and found over 66K results; however the browser stopped pushing new notifications to my browser after it received the 37,000th notification. Let's put that into perspective, when you're on a YouTube and you're watching the livestream of Project Apario's telehackathon, a few hundred (at most) comments will come into your browser during this show. Your computer doesn't struggle with the volume of messages its receiving.

The flip side of that is when you're on Project Apario and you're on a super fast machine like the one I have and your search stops updating the progress bar because your browser ran out of memory, its time to reduce the volume of transmissions from 37,000 down to 612.

What REALLY does this mean?

Well, when you're doing a search for - use keywords that have never been searched before:

california and newsom

Project Apario, with this complex search, will go out and perform a search on those two queries simultaneously and in the event that your browser potentially receives 37,000 notifications PER RESULT don't be surprised if the page stops loading for you. Instead, now effectively we'd be able to theoretically handle a complex query with at least 60 combinations of keyword searches at once - now the backend systems to Project Apario will enqueue those and they make take a long time to complete; which is why Project Apario limits your ability to create advanced search queries up to 9 keywords max.