Part I, introduces the overall design of our object store. In this post we focus on the API layer. All layers till now were just concerned about storing the input file together with some file-format specific transforms (like thumbnails). It is at the API layer where we will be storing per-file system and user metadata. This metadata can be used to support application specific business logic and security policies.
This layer will depend on all per-file-format modules: ImageStore
, VideoStore
, etc. We will use Postgres for storing per-file metadata, so we also depend on the postgrex package. A typical API layer will also be exposing a GraphQL interface which forms the core of application specific business logic. I am not going to include an example GraphQL interface here but absinthe would be my preferred way of doing it, anytime.
Exactly what metadata is stored for files is very application dependent, but for now, lets assume the following files
SQL table structure:
CREATE TABLE files (
id TEXT NOT NULL PRIMARY KEY,
owner_id TEXT,
format TEXT,
size INT,
created_at TIMESTAMPTZ,
thumb_file_id TEXT,
type_data JSONB,
tags TEXT[]
);
In this table, the type_data
field refers for file-format specific metadata: ex. for images ir will store { width, height}
, for videos { width, height, duration}
, for documents { num_pages }
and so on.
A typical application will have many other SQL tables referring to files by their file.id
. For example, it may have a user
SQL table with a profile_pic
field storing file.id
of the user’s profile image.
The API layer/application will typically have many sub-modules, providing interface to your business logic. For our purpose of describing a layered object store, we will only focus on the Api.File
submodule (defined in lib/file.ex
). Here is the interface for the Api.File
module:
def put_file(owner_id, mime, input_path, tags)
: Stores file at @input_path on disk and records its meta-data in thefiles
SQL table we defined above. Returns{:ok, file_id}
on success,{:error, reason}
on failure.def get_file(user_id, file_id)
: Gets contents of file with id @file_id. The @user_id parameter can be used to enforce policies like: “only the file owner can access a file”. Your application may omit this parameter if not needed. The function returns binary data on success,{:error, reason}
on failure.def get_file_meta(user_id, file_id)
: Gets all per-file metadata for file with id @file_id. The @user_id parameter can be used to check against file’sowner_id
is your security policy so requires.
With the module interface defined, lets move to its implementation, starting with module creation:
mix new --sup api
This layer requires a supervision tree with Postgrex processes as children like this:
lib/application.ex
:
defmodule Api.Application do
# See https://hexdocs.pm/elixir/Application.html
# for more information on OTP Applications
@moduledoc false
use Application
def start(_type, _args) do
# List all child processes to be supervised
children = [
# Starts a worker by calling: Api.Worker.start_link(arg)
# {Api.Worker, arg}
{Postgrex, hostname: "localhost", username: "postgres", database: "mydb", name: DB}
]
# See https://hexdocs.pm/elixir/Supervisor.html
# for other strategies and supported options
opts = [strategy: :one_for_one, name: Api.Supervisor]
Supervisor.start_link(children, opts)
end
end
Constants and Structs
defmodule Api.File do
@type file_id() :: String.t()
@type user_id() :: String.t()
defstruct [:id, :owner_id, :format, :size, :created_at, :thumb_file_id, :type_data, :tags]
end
This struct parallels our files
SQL table.
Put File
@doc """
Stores file at @input_path on disk and records its meta-data.
Returns `{:ok, file_id}` on success, {:error, reason} on failure.
"""
def put_file(owner_id, mime, input_path, tags) do
case file_type(mime) do
:image ->
put_image(owner_id, input_path, tags)
:video ->
put_video(owner_id, input_path, tags)
:doc ->
put_doc(owner_id, input_path, tags)
_ ->
IO.puts("put_file: Unknown file type")
{:error, "Unknown file type"}
end
end
Lets see how one of the file-format specific functions is implemented:
# Stores image together with its thumbnail.
# Returns:
# On success: {:ok, file_id} where file_id corresponds to the fullsize image.
# On error: throws
defp put_image(owner_id, input_path, tags) do
{:ok, image} = ImageStore.put_image_lossy(datadir(), input_path)
%Api.File{
id: image.file_id,
owner_id: owner_id,
format: ImageStore.format(),
size: File.stat!(FileStore.get_file_path(datadir(), image.file_id)).size,
created_at: DateTime.utc_now(),
thumb_file_id: image.thumb_file_id,
type_data: %{
width: image.width,
height: image.height
},
tags: tags
}
|> record_file()
end
It simply passes on the input file to the ImageStore module and on success, records its metadata with record_file()
. Similarly, put_video
forwards to VideoStore.put_video
, and put_doc forwards to DocStore.put_doc
followed by record_file
.
Next, lets looks at the record_file
function which records metadata for files successfully store in our object store.
@spec record_file(%Api.File{}) :: {:ok, file_id()}
defp record_file(f) do
Postgrex.query!(
DB,
"DELETE FROM files WHERE id=$1",
[f.id]
)
Postgrex.query!(
DB,
"""
INSERT INTO files (id, owner_id, format, size, created_at, thumb_file_id, type_data, tags)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
""",
[f.id, f.owner_id, f.format, f.size, f.created_at, f.thumb_file_id, f.type_data, f.tags]
)
{:ok, f.id}
end
We first delete file with input file’s ID to allow for cases where we are updating some field(s) of an existing file. We could use ecto here to make it look nicer but I won’t add this dependency unless rest of the module really needs such an abstraction. In my projects, I have always used Postgrex directly since assembling together required SQL string seems simple enough – we have to be careful not be introduce SQL injection vulnerabilities by making sure we pass untrusted user input as SQL variables ($1
, $2
, …).
Get input file type
This is used by put_file
to figure which format-specific module to call.
defp file_type(mime) do
cond do
String.starts_with?(mime, "image/") ->
:image
String.starts_with?(mime, "video/") ->
:video
String.equivalent?(mime, "application/pdf") ->
:doc
true ->
:unknown
end
end
Get File
@spec get_file(user_id(), file_id()) :: {binary() | {:error, String.t()}}
def get_file(_user_id, file_id) do
# NOTE: Add any checks against user_id (say, match against file's owner_id) here.
FileStore.get_file(datadir(), file_id)
end
Get file metadata
File metadata is typically exposed through a GraphQL interface defined in your API module. For instance:
User(id: "foo") {
profilePic {
file_id
owner
createdAt
size
format
width
height
thumbnail {
file_id
}
}
}
The get_file_meta
function may be used to satisfy GraphQL queries like above.
@doc """
Get all metadata associated with file with id @file_id.
Returns {:ok, %Api.File{}} on success, {:ok, nil} otherwise.
"""
@spec get_file_meta(user_id(), file_id()) :: {:ok, %Api.File{} | nil}
def get_file_meta(_user_id, file_id) do
# NOTE: Add any checks against user_id (say, match against file's owner_id) here.
res =
Postgrex.query!(
DB,
"""
SELECT (f.id, f.owner_id, f.format, f.size, f.created_at, f.thumb_file_id, f.type_data, f.tags)
FROM files as f
WHERE id=$1
""",
[file_id]
)
if res.num_rows == 1 do
[{file_id, owner_id, format, size, created_at, thumb_file_id, type_data, tags}] =
res.rows |> List.flatten()
{:ok,
%Api.File{
id: file_id,
owner_id: owner_id,
format: format,
size: size,
created_at: created_at,
thumb_file_id: thumb_file_id,
type_data: type_data |> MapHelpers.trusted_map_atomize_keys(),
tags: tags
}}
else
{:ok, nil}
end
end
Summary
The API layer is the place where we store per-file system and user-defined metadata. This metadata can be used to implement application specific security policies or other things needed for your business logic (say, per-file user specified tags). We focused on a particular submodule Api.File
which acts as a thin wrapper around format-specific modules (ImageStore
, VideoStore
, DocStore
and the FileStore
). A typical API layer would include GraphQL schema for exposing information (including per-file metadata) and we provided an example of that with get_file_metadata()
implementation.