shutterstock_734287003-1

A smart search engine that helps visitors easily find an architect project

BNAlogo

About BNA & Archy

The BNA–also known as The Royal Institute of Dutch Architects–connects around 1100 architectural firms. Driven by the need for sustainable architecture, BNA promotes creative entrepreneurship while strengthening the position of architects and the industry. As part of their web services, BNA offers a search platform called Archy for private and professional users. Archy is meant to connect users looking for architectural inspiration and innovative architectural firms. With Archy, users can find an architectural project based on their entered search terms, and agencies do not have to access another portal to keep their content up-to-date. On Archy, all project information is automatically pulled from agency websites. As a search platform, Archy gets smarter after each use. 

archy

The Goal

  • Build a search tool (Archy) that will help customers and professionals find architectural firms and project pages based on keywords.
  • Have the tool updated in real-time so architects always have their most recent project information. Also new projects are automatically added and old projects are removed. 
  • For Archy to be capable of pulling from registered agency websites to keep content updated.

The Challenge

The search tool was originally created with a static project page filter in which: 

        1. one model was trained and had a manual one time run,
        2. one set of predictions were made for all websites on a manual one time run,
        3. the same predictions were used for the project page filter.

The above resulted in web pages not being updated with project page predictions by the model and missing project pages in the search tool. Additionally, the search tool backend had two search options:

  1. a web based search that looks through a database on crawled websites AND
  2. a CRM database search that looks through an archive.

Both search options resulted in confusion for the average user and difficulty in pulling relevant search results. 

The Process

To get Archy to where it needed to be, our team at Crystalloids:

  1. started by adding more data for the model to train on by hand-labeling an extra 2,000 observations that we then combined to the original dataset from 2019. This gave us ~4,000 observations for the model to train on. 
  2. Rebuilt the model incorporating labeled (~4,000) and unlabeled observations (~150k). This model boosted project page accuracy from 84% to 90%. To accomplish this, the team made use of semi-supervised learning and a hand-constructed training process.
  3. Created a data workflow and a train model pipeline to automate the project page filter process.

The data pipeline workflow, which is set to trigger once a month:

    1. launches a cloud function–that was written in Python– that fetches the latest website data from Archy in BigQuery and stores the results in cloud storage. This updates old websites and adds/removes new/old websites. 
    2. Launches a cloud function (also written in Python) that starts a custom vertex AI job. This transforms the ‘raw’ website data into a format that the model will understand. 
    3. Thirdly, another Python written cloud function is launched which starts a custom vertex AI job. The ‘processed’ website data is then fed into the latest trained model to get predictions for whether or not the website is a project page. 
    4. Finally, a cloud function that starts a dataflow job to ingest the ‘scored’ website data into datastore is launched.
    5. These four functions result in a continuously running website crawler fetching predictions out of the datastore and updates the search engine API to incorporate these predictions/filters.

 The train model workflow, which is set to trigger every quarter:

  1. launches a cloud function written in Python that fetches ‘processed’ website data to combine with the hand-labeled observations stored in GCS. 
  2. Launches another cloud function that starts a custom vertex AI job to feed this ‘combined’ data to a fresh model for training. This is saved as the latest model in GCS once training is done. 

The Result

  1. An automated project page filter process that updates new and old websites with predictions from a continuously trained model.

  2. A backend that returns results from web-based searches as well as CRM based searches. This backend was coded in Java and tested/deployed via search engine API.

Technology Used

  • Google Cloud Storage 
  • Google Big Query
  • Cloud Functions 
  • Cloud Datastore 
  • Cloud Dataflow
  • Vertex AI/Tensorflow
  • Cloud Workflow
  • Cloud Scheduler
  • Compute Engine
  • Search Engine API

Share this story