Favicons R-apped

rstats
data-wrangling
browsers
favicons
history
Author

@hrbrmstr

Published

December 18, 2024

This is a companion page to this Daily Drop post, but it’s pretty self-contained. We’re going to have some fun with favicons from sites you visited this year.

The Plan

We’re going to grab the browser history database, and favicon database, limit the favicons to sites you visited this year, and then wrap them all up.

Getting This Year’s Favicons

We’ll be using the {tidyverse}, {RSQLite}, and {magick} packages:

library(tidyverse)
library(RSQLite)
library(magick)
Linking to ImageMagick 6.9.12.93
Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
Disabled features: fftw, ghostscript, x11

Next, we’ll make copies of the favicons and history databases (it’s not a great idea to work with the original databases). See the Daily Drop post for more details on where to find these databases.

file.copy("~/Library/Application Support/Arc/User Data/Default/Favicons", "~/Data/Favicons")
[1] FALSE
file.copy("~/Library/Application Support/Arc/User Data/Default/History", "~/Data/History")
[1] FALSE

Now we’ll connect to the databases and get the favicons and history data. First the URLs from this year:

con_h <- dbConnect(RSQLite::SQLite(), "~/Data/History")

visits <- tbl(con_h, "visits")
urls <- tbl(con_h, "urls")

visits |> 
  left_join(
    urls, 
    by = c("url"="id")
  ) |> 
  filter(
    datetime(visit_time/1000000-11644473600, 'unixepoch') >= '2024-01-01'
  ) |>
  select(
    id,
    url = url.y,
    visit_time, # in case you want this info
    last_visit_time # in case you want this info
  ) |> 
  collect() -> visits_2024

Next, the favicons:

con_f <- dbConnect(RSQLite::SQLite(), "~/Data/Favicons")

favicons <- tbl(con_f, "favicons")
icon_mapping <- tbl(con_f, "icon_mapping")
favicon_bitmaps <- tbl(con_f, "favicon_bitmaps")

favicon_bitmaps |> 
  left_join(
    favicons, 
    by = c("icon_id"="id")
  ) |> 
  left_join(
    icon_mapping,
    c("icon_id")
  ) |> 
  filter(
    page_url %in% visits_2024$url,
    width == 32
  ) |> 
  distinct(
    image_data
  ) |> 
  pull(
    image_data
  ) |> 
  map(
    magick::image_read
  ) -> icons

dbDisconnect(con_h)
dbDisconnect(con_f)

Now we’re ready to have some fun!

Boring Favicons

Let’s just see what we have collected:

n <- ceiling(sqrt(length(icons)))

icon_list <- image_join(icons)

image_montage(
  image = icon_list, 
  tile = sprintf("%1$dx%1$d", n), 
  geometry="32x32+0+0", 
  bg = "white"
) -> icons_montage

icons_montage

See? Boring!

Clustered Favicons

We can make the icon montage a bit more interesting by clustering the favicons. We’ll convert each to greyscale, yank the numeric bitmap values, and then do some very basic clustering on them.

The “6” is the number of clusters we’ll make, and it’s for a reason you’ll see in a bit.

extract_features <- function(img) {

  image_convert(img, colorspace = "gray") |> 
    image_data() |> 
    as.numeric() -> gray_pixels
  
  mean_val <- mean(gray_pixels)
  sd_val <- sd(gray_pixels)
  
  c(gray_pixels, mean_val, sd_val)
}

feature_matrix <- t(sapply(icon_list, extract_features))

scaled_features <- scale(feature_matrix)

k <- 6
clusters <- kmeans(scaled_features, centers = k)

icon_clusters <- split(icon_list, clusters$cluster)

icon_clusters_joined <- image_join(icon_clusters)

image_montage(
  image = icon_clusters_joined, 
  tile = sprintf("%1$dx%1$d", n), 
  geometry="32x32+0+0", 
  bg = "white"
) -> clusters_montage

clusters_montage

Favicons Wrapped

We’re going to wrap up our favicons into a message. To do that, we’ll need to make some letter masks, and then fill them in each of them with as many icons as we can from each cluster.

You may need to/can absolutely use a different font. Impact is just beefy enough to tile icons in.

create_letter_mask <- function(letter, width = 250, height = 500) {
  
  image_blank(
    width = width, 
    height = height, 
    color = "white"
  ) |> 
    image_annotate(
      text = letter, 
      size = 450, 
      gravity = "center", 
      color = "black", 
      font = "Impact"
    ) |> 
    image_threshold(
      type = "black", 
      threshold = "99%"
    )
  
}

Our seekrit message is the best practical programming language in the world (Smalltalk is the bestest tho), so let’s make a mask for each letter:

r_mask <- create_letter_mask("R")
s1_mask <- create_letter_mask("S")
t1_mask <- create_letter_mask("t")
a_mask <- create_letter_mask("a")
t2_mask <- create_letter_mask("t")
s2_mask <- create_letter_mask("s")

Now we need a way to fill them. You can use a different icon size (this shrinks the icons down to 8x8 pixels), but you’ll also have fewer icons to work with for each letter if you do.

fill_letter <- function(mask, icons, icon_size = 8) {
  
  width <- image_info(mask)$width
  height <- image_info(mask)$height
  
  canvas <- image_blank(width, height, "white")
  
  cols <- width %/% icon_size
  rows <- height %/% icon_size
  
  for (i in 1:rows) {
    for (j in 1:cols) {
      
      x <- (j-1) * icon_size
      y <- (i-1) * icon_size
      
      pixel <- image_crop(mask, geometry = sprintf("1x1+%d+%d", x, y))
      
      if (image_data(pixel)[1] < 0.5) {  # If pixel is dark (part of letter)
        
        sample(icons, 1) |> 
          image_scale(
            sprintf("%dx%d", icon_size, icon_size)
          ) |> 
          image_composite(
            image = canvas,
            composite_image = _,
            offset = sprintf("+%d+%d", x, y)
          ) -> canvas
        
      }
    }
  }
  
  return(canvas)
  
}

r_filled <- fill_letter(r_mask, icon_clusters[[1]])
s1_filled <- fill_letter(s1_mask, icon_clusters[[2]])
t1_filled <- fill_letter(t1_mask, icon_clusters[[3]])
a_filled <- fill_letter(a_mask, icon_clusters[[4]])
t2_filled <- fill_letter(t2_mask, icon_clusters[[5]])
s2_filled <- fill_letter(s2_mask, icon_clusters[[6]])

Now we just need to stitch them together:

rstats <- image_append(c(r_filled, s1_filled, t1_filled, a_filled, t2_filled, s2_filled))

rstats

That’s A Wrap!

You can adjust the cluster count to have more icon sets to make larger messages with and use other clustering methods to associated the icons according to other characteristics (or, even use the source URLs to help with the clustering).