A layered object store design in Elixir (Part II)

The FileStore layer

Part I, introduces the overall design of our object store. In this post we focus on its first layer, the FileStore.

The FileStore layer is responsible for actually storing the file in our object store. At this level, we are not concerned about what kind of file it is (image, video, document, or whatever else), nor do we have any notion of security. We just store whatever input path is given to us.

The most simplistic design would be do just store any incoming file in object store root directory datadir. This would obviously cause file naming conflicts. To avoid this naming issue, we can name each file same as its SHA256 value. Now, storing millions of files in a single directory can cause issues as filesystems, either local or networked, typically do not like it. To overcome such filesystem limitations, we will use this naming and storage scheme:

  • Calculate SHA256 hash for input file.
  • Encode the hash as base32
    • Base32 allows use of just the lowercase ASCII characters along with numbers. This guarantees that the resulting string (which we use as file ID) can be safely used in URLs, and is also safe to store on case-insensitive filesystems.
    • With base32 encoding, a SHA256 hash is guaranteed to be of length ceil(128 / 32) which is 52 (ex: wzzdipvhjndw2b73gyzmv3ngpc6kur4wl2mx66jqxtkfzip4q25q)
  • Slice this string at every 4 characters to get filesystem hierarchy (ex: wzzd/ipvh/jndw/2b73/gyzm/v3ng/pc6k/ur4w/l2mx/66jq/xtkf/zip4/q25q)

With the design laid out, we need to define our interfaces:

  • def put_file(datadir, input_path): Stores file at input_path in object store rooted at datadir. Returns {:ok, file_id} on success, {:error, reason} on failure. We use has SHA256 value as the file_id.
  • def get_file(datadir, file_id): Gets binary data of file with id file_id. Returns binary data on success, {:error, reason} on failure.
  • def get_file_path(datadir, file_id): Returns local path for file with id file_id. This function will be useful when we want to process a stored file with some external program (like ImageMagick for thumbnail generation or ffmpeg for video frame extraction).
  • def delete_file(datadir, file_id): Deletes file with id file_id. Returns :ok if successful, or {:error, reason} if an error occurs.

Now with interfaces laid out, let’s define each one of them. We will start with creating a new module:

mix new file_store

Note that we don’t need a supervision tree or any stateful application: it’s a pure library component. Also, it does not have any external dependencies. All functions will go in the FileStore module defined in lib/file_store.ex.

Put File

  @doc """
    Returns {:ok, file_id} on success, {:error, reason} on failure
  """
  def put_file(datadir, file_path) do
    case File.read(file_path) do
      {:ok, binary} -> put_binary(datadir, binary)
      {:error, reason} -> {:error, reason}
    end
  end

  defp put_binary(datadir, binary) do
    file_id =
      :crypto.hash(:sha256, binary)
      |> Base.encode32(padding: false, case: :lower)

    path = Path.join(datadir, make_file_path(file_id))

    case File.mkdir_p(Path.dirname(path)) do
      {:error, reason} ->
        {:error, reason}

      _ ->
        case File.open(path, [:binary, :write]) do
          {:ok, out_file} ->
            IO.binwrite(out_file, binary)
            File.close(out_file)
            {:ok, file_id}

          {:error, reason} ->
            {:error, reason}
        end
    end
  end

We can now try out this implementation in the iex console.

iex(1)> FileStore.put_file("/home/ngupta/data", "/tmp/sample.jpg")

{:ok, "tlioxmlti6mamjb7iblpl2d3tuooyltlkvni7gymbgtbuzt4hcxa"}

Here is how the directory stucture looks like after storing this file:

/home/ngupta/data
└── tlio
    └── xmlt
        └── i6ma
            └── mjb7
                └── iblp
                    └── l2d3
                        └── tuoo
                            └── yltl
                                └── kvni
                                    └── 7gym
                                        └── bgtb
                                            └── uzt4
                                                └── hcxa

We used a function make_path with takes 52-byte hash string and gives this directory structure. Lets define that too:

  # Construct a full path from 52 char hash string.
  #
  # ex: "wzzdipvhjndw2b73gyzmv3ngpc6kur4wl2mx66jqxtkfzip4q25q"
  #  -> "wzzd/ipvh/jndw/2b73/gyzm/v3ng/pc6k/ur4w/l2mx/66jq/xtkf/zip4/q25q"
  defp make_file_path(hash52) when byte_size(hash52) == 52 do
    0..12
    |> Enum.to_list()
    |> Enum.map(fn x -> (x * 4)..(x * 4 + 3) end)
    |> Enum.map(fn range -> String.slice(hash52, range) end)
    |> Path.join()
  end

Get File

  @doc """
    Returns data on success, {:error, reason} on failure.

    # ex. file_id: "wzzdipvhjndw2b73gyzmv3ngpc6kur4wl2mx66jqxtkfzip4q25q"
  """
  def get_file(datadir, file_id) do
    path = Path.join(datadir, make_file_path(file_id))

    case File.open(path, [:binary, :read]) do
      {:ok, file} -> IO.binread(file, :all)
      {:error, reason} -> {:error, reason}
    end
  end

Get File Path

  @doc """
    Returns local path for file with id @file_id
  """
  def get_file_path(datadir, file_id) do
    path = Path.join(datadir, make_file_path(file_id))
    if File.exists?(path), do: path, else: nil
  end

Delete File

  @doc """
    Returns :ok if successful, or {:error, reason} if an error occurs.
  """
  def delete_file(datadir, file_id) do
    path = Path.join(datadir, make_file_path(file_id))
    File.rm(path)
  end

Summary

This completes our implementation of the FileStore module. We started with defining our requirements, considered some potential (file-)system limitations, defined required interfaces, and then proceeded with implementation of each of our interfaces. I have not included any tests here but you should definitely consider adding them for all four interfaces we defined above.

elixir 

See also