A layered object store design in Elixir (Part IV)

The API layer

Part I, introduces the overall design of our object store. In this post we focus on the API layer. All layers till now were just concerned about storing the input file together with some file-format specific transforms (like thumbnails). It is at the API layer where we will be storing per-file system and user metadata. This metadata can be used to support application specific business logic and security policies.

This layer will depend on all per-file-format modules: ImageStore, VideoStore, etc. We will use Postgres for storing per-file metadata, so we also depend on the postgrex package. A typical API layer will also be exposing a GraphQL interface which forms the core of application specific business logic. I am not going to include an example GraphQL interface here but absinthe would be my preferred way of doing it, anytime.

Exactly what metadata is stored for files is very application dependent, but for now, lets assume the following files SQL table structure:

CREATE TABLE files (
    id TEXT NOT NULL PRIMARY KEY,
    owner_id TEXT,
    format TEXT,
    size INT,
    created_at TIMESTAMPTZ,
    thumb_file_id TEXT,
    type_data JSONB,
    tags TEXT[]
);

In this table, the type_data field refers for file-format specific metadata: ex. for images ir will store { width, height}, for videos { width, height, duration}, for documents { num_pages } and so on.

A typical application will have many other SQL tables referring to files by their file.id. For example, it may have a user SQL table with a profile_pic field storing file.id of the user’s profile image.

The API layer/application will typically have many sub-modules, providing interface to your business logic. For our purpose of describing a layered object store, we will only focus on the Api.File submodule (defined in lib/file.ex). Here is the interface for the Api.File module:

  • def put_file(owner_id, mime, input_path, tags): Stores file at @input_path on disk and records its meta-data in the files SQL table we defined above. Returns {:ok, file_id} on success, {:error, reason} on failure.
  • def get_file(user_id, file_id): Gets contents of file with id @file_id. The @user_id parameter can be used to enforce policies like: “only the file owner can access a file”. Your application may omit this parameter if not needed. The function returns binary data on success, {:error, reason} on failure.
  • def get_file_meta(user_id, file_id): Gets all per-file metadata for file with id @file_id. The @user_id parameter can be used to check against file’s owner_id is your security policy so requires.

With the module interface defined, lets move to its implementation, starting with module creation:

mix new --sup api

This layer requires a supervision tree with Postgrex processes as children like this:

lib/application.ex:

defmodule Api.Application do
  # See https://hexdocs.pm/elixir/Application.html
  # for more information on OTP Applications
  @moduledoc false

  use Application

  def start(_type, _args) do
    # List all child processes to be supervised
    children = [
      # Starts a worker by calling: Api.Worker.start_link(arg)
      # {Api.Worker, arg}

      {Postgrex, hostname: "localhost", username: "postgres", database: "mydb", name: DB}
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: Api.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

Constants and Structs

defmodule Api.File do
  @type file_id() :: String.t()
  @type user_id() :: String.t()

  defstruct [:id, :owner_id, :format, :size, :created_at, :thumb_file_id, :type_data, :tags]
end

This struct parallels our files SQL table.

Put File

  @doc """
    Stores file at @input_path on disk and records its meta-data.

    Returns `{:ok, file_id}` on success, {:error, reason} on failure.
  """
  def put_file(owner_id, mime, input_path, tags) do
    case file_type(mime) do
      :image ->
        put_image(owner_id, input_path, tags)

      :video ->
        put_video(owner_id, input_path, tags)

      :doc ->
        put_doc(owner_id, input_path, tags)

      _ ->
        IO.puts("put_file: Unknown file type")
        {:error, "Unknown file type"}
    end
  end

Lets see how one of the file-format specific functions is implemented:

  # Stores image together with its thumbnail.
  # Returns:
  #  On success: {:ok, file_id} where file_id corresponds to the fullsize image.
  #  On error: throws
  defp put_image(owner_id, input_path, tags) do
    {:ok, image} = ImageStore.put_image_lossy(datadir(), input_path)

    %Api.File{
      id: image.file_id,
      owner_id: owner_id,
      format: ImageStore.format(),
      size: File.stat!(FileStore.get_file_path(datadir(), image.file_id)).size,
      created_at: DateTime.utc_now(),
      thumb_file_id: image.thumb_file_id,
      type_data: %{
        width: image.width,
        height: image.height
      },
      tags: tags
    }
    |> record_file()
  end

It simply passes on the input file to the ImageStore module and on success, records its metadata with record_file(). Similarly, put_video forwards to VideoStore.put_video, and put_doc forwards to DocStore.put_doc followed by record_file.

Next, lets looks at the record_file function which records metadata for files successfully store in our object store.

  @spec record_file(%Api.File{}) :: {:ok, file_id()}
  defp record_file(f) do
    Postgrex.query!(
      DB,
      "DELETE FROM files WHERE id=$1",
      [f.id]
    )

    Postgrex.query!(
      DB,
      """
      INSERT INTO files (id, owner_id, format, size, created_at, thumb_file_id, type_data, tags)
      VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
      """,
      [f.id, f.owner_id, f.format, f.size, f.created_at, f.thumb_file_id, f.type_data, f.tags]
    )

    {:ok, f.id}
  end

We first delete file with input file’s ID to allow for cases where we are updating some field(s) of an existing file. We could use ecto here to make it look nicer but I won’t add this dependency unless rest of the module really needs such an abstraction. In my projects, I have always used Postgrex directly since assembling together required SQL string seems simple enough – we have to be careful not be introduce SQL injection vulnerabilities by making sure we pass untrusted user input as SQL variables ($1, $2, …).

Get input file type

This is used by put_file to figure which format-specific module to call.

  defp file_type(mime) do
    cond do
      String.starts_with?(mime, "image/") ->
        :image

      String.starts_with?(mime, "video/") ->
        :video

      String.equivalent?(mime, "application/pdf") ->
        :doc

      true ->
        :unknown
    end
  end

Get File

  @spec get_file(user_id(), file_id()) :: {binary() | {:error, String.t()}}
  def get_file(_user_id, file_id) do
    # NOTE: Add any checks against user_id (say, match against file's owner_id) here.
    FileStore.get_file(datadir(), file_id)
  end

Get file metadata

File metadata is typically exposed through a GraphQL interface defined in your API module. For instance:

User(id: "foo") {
  profilePic {
    file_id
    owner
    createdAt
    size
    format
    width
    height
    thumbnail {
      file_id
    }
  }
}

The get_file_meta function may be used to satisfy GraphQL queries like above.

  @doc """
    Get all metadata associated with file with id @file_id.

    Returns {:ok, %Api.File{}} on success, {:ok, nil} otherwise.
  """
  @spec get_file_meta(user_id(), file_id()) :: {:ok, %Api.File{} | nil}
  def get_file_meta(_user_id, file_id) do
    # NOTE: Add any checks against user_id (say, match against file's owner_id) here.
    res =
      Postgrex.query!(
        DB,
        """
        SELECT (f.id, f.owner_id, f.format, f.size, f.created_at, f.thumb_file_id, f.type_data, f.tags)
        FROM files as f
        WHERE id=$1
        """,
        [file_id]
      )

    if res.num_rows == 1 do
      [{file_id, owner_id, format, size, created_at, thumb_file_id, type_data, tags}] =
        res.rows |> List.flatten()

      {:ok,
       %Api.File{
         id: file_id,
         owner_id: owner_id,
         format: format,
         size: size,
         created_at: created_at,
         thumb_file_id: thumb_file_id,
         type_data: type_data |> MapHelpers.trusted_map_atomize_keys(),
         tags: tags
       }}
    else
      {:ok, nil}
    end
  end

Summary

The API layer is the place where we store per-file system and user-defined metadata. This metadata can be used to implement application specific security policies or other things needed for your business logic (say, per-file user specified tags). We focused on a particular submodule Api.File which acts as a thin wrapper around format-specific modules (ImageStore, VideoStore, DocStore and the FileStore). A typical API layer would include GraphQL schema for exposing information (including per-file metadata) and we provided an example of that with get_file_metadata() implementation.

elixir 

See also