Part I, introduces the overall design of our object store. In this post we focus on its first layer, the FileStore
.
The FileStore layer is responsible for actually storing the file in our object store. At this level, we are not concerned about what kind of file it is (image, video, document, or whatever else), nor do we have any notion of security. We just store whatever input path is given to us.
The most simplistic design would be do just store any incoming file in object store root directory datadir
. This would obviously cause file naming conflicts. To avoid this naming issue, we can name each file same as its SHA256
value. Now, storing millions of files in a single directory can cause issues as filesystems, either local or networked, typically do not like it. To overcome such filesystem limitations, we will use this naming and storage scheme:
- Calculate SHA256 hash for input file.
- Encode the hash as base32
- Base32 allows use of just the lowercase ASCII characters along with numbers. This guarantees that the resulting string (which we use as file ID) can be safely used in URLs, and is also safe to store on case-insensitive filesystems.
- With base32 encoding, a SHA256 hash is guaranteed to be of length ceil(128 / 32) which is 52 (ex:
wzzdipvhjndw2b73gyzmv3ngpc6kur4wl2mx66jqxtkfzip4q25q
)
- Slice this string at every 4 characters to get filesystem hierarchy (ex:
wzzd/ipvh/jndw/2b73/gyzm/v3ng/pc6k/ur4w/l2mx/66jq/xtkf/zip4/q25q
)
With the design laid out, we need to define our interfaces:
def put_file(datadir, input_path)
: Stores file atinput_path
in object store rooted atdatadir
. Returns{:ok, file_id}
on success,{:error, reason}
on failure. We use has SHA256 value as thefile_id
.def get_file(datadir, file_id)
: Gets binary data of file with idfile_id
. Returns binary data on success,{:error, reason}
on failure.def get_file_path(datadir, file_id)
: Returns local path for file with idfile_id
. This function will be useful when we want to process a stored file with some external program (like ImageMagick for thumbnail generation or ffmpeg for video frame extraction).def delete_file(datadir, file_id)
: Deletes file with idfile_id
. Returns:ok
if successful, or{:error, reason}
if an error occurs.
Now with interfaces laid out, let’s define each one of them. We will start with creating a new module:
mix new file_store
Note that we don’t need a supervision tree or any stateful application: it’s a pure library component. Also, it does not have any external dependencies. All functions will go in the FileStore
module defined in lib/file_store.ex
.
Put File
@doc """
Returns {:ok, file_id} on success, {:error, reason} on failure
"""
def put_file(datadir, file_path) do
case File.read(file_path) do
{:ok, binary} -> put_binary(datadir, binary)
{:error, reason} -> {:error, reason}
end
end
defp put_binary(datadir, binary) do
file_id =
:crypto.hash(:sha256, binary)
|> Base.encode32(padding: false, case: :lower)
path = Path.join(datadir, make_file_path(file_id))
case File.mkdir_p(Path.dirname(path)) do
{:error, reason} ->
{:error, reason}
_ ->
case File.open(path, [:binary, :write]) do
{:ok, out_file} ->
IO.binwrite(out_file, binary)
File.close(out_file)
{:ok, file_id}
{:error, reason} ->
{:error, reason}
end
end
end
We can now try out this implementation in the iex
console.
iex(1)> FileStore.put_file("/home/ngupta/data", "/tmp/sample.jpg")
{:ok, "tlioxmlti6mamjb7iblpl2d3tuooyltlkvni7gymbgtbuzt4hcxa"}
Here is how the directory stucture looks like after storing this file:
/home/ngupta/data
└── tlio
└── xmlt
└── i6ma
└── mjb7
└── iblp
└── l2d3
└── tuoo
└── yltl
└── kvni
└── 7gym
└── bgtb
└── uzt4
└── hcxa
We used a function make_path
with takes 52-byte hash string and gives this directory structure. Lets define that too:
# Construct a full path from 52 char hash string.
#
# ex: "wzzdipvhjndw2b73gyzmv3ngpc6kur4wl2mx66jqxtkfzip4q25q"
# -> "wzzd/ipvh/jndw/2b73/gyzm/v3ng/pc6k/ur4w/l2mx/66jq/xtkf/zip4/q25q"
defp make_file_path(hash52) when byte_size(hash52) == 52 do
0..12
|> Enum.to_list()
|> Enum.map(fn x -> (x * 4)..(x * 4 + 3) end)
|> Enum.map(fn range -> String.slice(hash52, range) end)
|> Path.join()
end
Get File
@doc """
Returns data on success, {:error, reason} on failure.
# ex. file_id: "wzzdipvhjndw2b73gyzmv3ngpc6kur4wl2mx66jqxtkfzip4q25q"
"""
def get_file(datadir, file_id) do
path = Path.join(datadir, make_file_path(file_id))
case File.open(path, [:binary, :read]) do
{:ok, file} -> IO.binread(file, :all)
{:error, reason} -> {:error, reason}
end
end
Get File Path
@doc """
Returns local path for file with id @file_id
"""
def get_file_path(datadir, file_id) do
path = Path.join(datadir, make_file_path(file_id))
if File.exists?(path), do: path, else: nil
end
Delete File
@doc """
Returns :ok if successful, or {:error, reason} if an error occurs.
"""
def delete_file(datadir, file_id) do
path = Path.join(datadir, make_file_path(file_id))
File.rm(path)
end
Summary
This completes our implementation of the FileStore
module. We started with defining our requirements, considered some potential (file-)system limitations, defined required interfaces, and then proceeded with implementation of each of our interfaces. I have not included any tests here but you should definitely consider adding them for all four interfaces we defined above.