Discussion:
[PATCH 0/2] nilfs2: improve inode allocation algorithm
Andreas Rohner
2014-10-12 10:38:21 UTC
Permalink
Hi,

The algorithm simply makes sure, that after a directory inode there are
a certain number of free slots available and the search for file inodes
is started at their parent directory.

I havn't had the time yet to do a full scale performance test of it, but
my simple preliminary tests have shown, that the allocation of inodes
takes a little bit longer and the lookup is a little bit faster. My
simple test just creates 1500 directories and after that creates 10
files in each directory.

So more testing is definetly necessary, but I wanted to get some
feedback about the design first. Is my code a step in the right
direction?

Best regards,
Andreas Rohner

Andreas Rohner (2):
nilfs2: support the allocation of whole blocks of meta data entries
nilfs2: improve inode allocation algorithm

fs/nilfs2/alloc.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++----
fs/nilfs2/alloc.h | 18 +++++-
fs/nilfs2/ifile.c | 63 ++++++++++++++++++--
fs/nilfs2/ifile.h | 6 +-
fs/nilfs2/inode.c | 6 +-
fs/nilfs2/segment.c | 5 +-
6 files changed, 235 insertions(+), 24 deletions(-)
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-10-12 10:38:23 UTC
Permalink
The current inode allocation algorithm of NILFS2 does not use any
information about the previous allocation or the parent directory of the
inode. It simply searches for a free entry in the ifile, always starting
from position 0. This patch introduces an improved allocation scheme.
There are no changes to the on-disk-format necessary.

First of all the algorithm distinguishes between files and directories.

File inodes start their search at the location of their parent
directory, so that they will be allocated near their parent. If the file
inode ends up on the same block as its parent, the common task of
listing the directory contents (e.g.: "ls") will be faster. Although
there is no guarantee, that any subsequent blocks after that will end up
near the block with the directory on disk, there is still a possibility
for performance improvement, because it can be expected, that subsequent
blocks are read ahead. Also the cleaner will write out blocks in
the order of their file offset, so there is an increased likelihood that
those blocks can be read in faster.

Directory inodes are allocated aligned to block boundaries and with
a certain number of empty slots following them. The empty slots
increase the likelihood, that files can be allocated in the same block
as their parent. Furthermore the alignment improves performance, because
unaligned locations do not have to be searched. The location of the last
successful allocation of a directory is stored in memory and used as a
starting point for the next allocation. This value is periodically reset
to 0 to allow the algorithm to find previously deleted slots.

Signed-off-by: Andreas Rohner <andreas.rohner-***@public.gmane.org>
---
fs/nilfs2/ifile.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++-----
fs/nilfs2/ifile.h | 6 +++--
fs/nilfs2/inode.c | 6 +++--
fs/nilfs2/segment.c | 5 ++++-
4 files changed, 69 insertions(+), 11 deletions(-)

diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c
index 6548c78..9fc83ce 100644
--- a/fs/nilfs2/ifile.c
+++ b/fs/nilfs2/ifile.c
@@ -33,20 +33,49 @@
* struct nilfs_ifile_info - on-memory private data of ifile
* @mi: on-memory private data of metadata file
* @palloc_cache: persistent object allocator cache of ifile
+ * @last_dir: ino of the last directory allocated
*/
struct nilfs_ifile_info {
struct nilfs_mdt_info mi;
struct nilfs_palloc_cache palloc_cache;
+ __u64 last_dir;
};

static inline struct nilfs_ifile_info *NILFS_IFILE_I(struct inode *ifile)
{
return (struct nilfs_ifile_info *)NILFS_MDT(ifile);
}
+/**
+ * nilfs_ifile_last_dir_reset - set last_dir to 0
+ * @ifile: ifile inode
+ *
+ * Description: The value of last_dir will increase with every new
+ * allocation of a directory, because it is used as the starting point of the
+ * search for a free entry in the ifile. It should be reset periodically to 0
+ * (e.g.: every segctor timeout), so that previously deleted entries can be
+ * found.
+ */
+void nilfs_ifile_last_dir_reset(struct inode *ifile)
+{
+ NILFS_IFILE_I(ifile)->last_dir = 0;
+}
+
+/**
+ * nilfs_ifile_inodes_per_block - get number of inodes in a block
+ * @ifile: ifile inode
+ *
+ * Return Value: The number of inodes that fit into a file system block.
+ */
+static inline int nilfs_ifile_inodes_per_block(struct inode *ifile)
+{
+ return (1 << ifile->i_blkbits) / sizeof(struct nilfs_inode);
+}

/**
* nilfs_ifile_create_inode - create a new disk inode
* @ifile: ifile inode
+ * @parent: inode number of the parent directory
+ * @mode: inode type of the newly created inode
* @out_ino: pointer to a variable to store inode number
* @out_bh: buffer_head contains newly allocated disk inode
*
@@ -62,17 +91,29 @@ static inline struct nilfs_ifile_info *NILFS_IFILE_I(struct inode *ifile)
*
* %-ENOSPC - No inode left.
*/
-int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
+int nilfs_ifile_create_inode(struct inode *ifile, ino_t parent,
+ umode_t mode, ino_t *out_ino,
struct buffer_head **out_bh)
{
struct nilfs_palloc_req req;
- int ret;
+ int ret, ipb = nilfs_ifile_inodes_per_block(ifile);

- req.pr_entry_nr = 0; /* 0 says find free inode from beginning of
- a group. dull code!! */
req.pr_entry_bh = NULL;

- ret = nilfs_palloc_prepare_alloc_entry(ifile, &req);
+ if (S_ISDIR(mode)) {
+ req.pr_entry_nr = NILFS_IFILE_I(ifile)->last_dir;
+ ret = __nilfs_palloc_prepare_alloc_entry(ifile, &req, ipb,
+ nilfs_palloc_entries_per_group(ifile) >> 2);
+ if (unlikely(ret == -ENOSPC)) {
+ /* fallback to normal allocation */
+ req.pr_entry_nr = 0;
+ ret = nilfs_palloc_prepare_alloc_entry(ifile, &req);
+ }
+ } else {
+ req.pr_entry_nr = parent + 1;
+ ret = nilfs_palloc_prepare_alloc_entry(ifile, &req);
+ }
+
if (!ret) {
ret = nilfs_palloc_get_entry_block(ifile, req.pr_entry_nr, 1,
&req.pr_entry_bh);
@@ -86,6 +127,13 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
nilfs_palloc_commit_alloc_entry(ifile, &req);
mark_buffer_dirty(req.pr_entry_bh);
nilfs_mdt_mark_dirty(ifile);
+ /*
+ * possible race condition/non atomic update
+ * last_dir is just a hint for the next allocation so
+ * the possible invalid values are not really harmful
+ */
+ if (S_ISDIR(mode))
+ NILFS_IFILE_I(ifile)->last_dir = req.pr_entry_nr + ipb;
*out_ino = (ino_t)req.pr_entry_nr;
*out_bh = req.pr_entry_bh;
return 0;
@@ -95,6 +143,7 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
* nilfs_ifile_delete_inode - delete a disk inode
* @ifile: ifile inode
* @ino: inode number
+ * @mode: inode type of the deleted inode
*
* Return Value: On success, 0 is returned. On error, one of the following
* negative error codes is returned.
@@ -105,7 +154,7 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
*
* %-ENOENT - The inode number @ino have not been allocated.
*/
-int nilfs_ifile_delete_inode(struct inode *ifile, ino_t ino)
+int nilfs_ifile_delete_inode(struct inode *ifile, ino_t ino, umode_t mode)
{
struct nilfs_palloc_req req = {
.pr_entry_nr = ino, .pr_entry_bh = NULL
@@ -137,6 +186,8 @@ int nilfs_ifile_delete_inode(struct inode *ifile, ino_t ino)

nilfs_palloc_commit_free_entry(ifile, &req);

+ if (S_ISDIR(mode))
+ NILFS_IFILE_I(ifile)->last_dir = req.pr_entry_nr;
return 0;
}

diff --git a/fs/nilfs2/ifile.h b/fs/nilfs2/ifile.h
index 679674d..81432a4 100644
--- a/fs/nilfs2/ifile.h
+++ b/fs/nilfs2/ifile.h
@@ -45,8 +45,10 @@ static inline void nilfs_ifile_unmap_inode(struct inode *ifile, ino_t ino,
kunmap(ibh->b_page);
}

-int nilfs_ifile_create_inode(struct inode *, ino_t *, struct buffer_head **);
-int nilfs_ifile_delete_inode(struct inode *, ino_t);
+void nilfs_ifile_last_dir_reset(struct inode *);
+int nilfs_ifile_create_inode(struct inode *, ino_t, umode_t, ino_t *,
+ struct buffer_head **);
+int nilfs_ifile_delete_inode(struct inode *, ino_t, umode_t);
int nilfs_ifile_get_inode_block(struct inode *, ino_t, struct buffer_head **);

int nilfs_ifile_count_free_inodes(struct inode *, u64 *, u64 *);
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index d071e7f..206d861f 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -370,7 +370,8 @@ struct inode *nilfs_new_inode(struct inode *dir, umode_t mode)
ii->i_state = 1 << NILFS_I_NEW;
ii->i_root = root;

- err = nilfs_ifile_create_inode(root->ifile, &ino, &ii->i_bh);
+ err = nilfs_ifile_create_inode(root->ifile, dir->i_ino, mode,
+ &ino, &ii->i_bh);
if (unlikely(err))
goto failed_ifile_create_inode;
/* reference count of i_bh inherits from nilfs_mdt_read_block() */
@@ -803,7 +804,8 @@ void nilfs_evict_inode(struct inode *inode)
nilfs_mark_inode_dirty(inode);
clear_inode(inode);

- ret = nilfs_ifile_delete_inode(ii->i_root->ifile, inode->i_ino);
+ ret = nilfs_ifile_delete_inode(ii->i_root->ifile, inode->i_ino,
+ inode->i_mode);
if (!ret)
atomic64_dec(&ii->i_root->inodes_count);

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index a1a1916..0caffb7 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2447,6 +2447,7 @@ static int nilfs_segctor_thread(void *arg)
{
struct nilfs_sc_info *sci = (struct nilfs_sc_info *)arg;
struct the_nilfs *nilfs = sci->sc_super->s_fs_info;
+ struct inode *ifile = sci->sc_root->ifile;
int timeout = 0;

sci->sc_timer.data = (unsigned long)current;
@@ -2510,8 +2511,10 @@ static int nilfs_segctor_thread(void *arg)
timeout = ((sci->sc_state & NILFS_SEGCTOR_COMMIT) &&
time_after_eq(jiffies, sci->sc_timer.expires));

- if (nilfs_sb_dirty(nilfs) && nilfs_sb_need_update(nilfs))
+ if (nilfs_sb_dirty(nilfs) && nilfs_sb_need_update(nilfs)) {
set_nilfs_discontinued(nilfs);
+ nilfs_ifile_last_dir_reset(ifile);
+ }
}
goto loop;
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-10-12 10:38:22 UTC
Permalink
This patch generalizes the function nilfs_palloc_prepare_alloc_entry()
by adding two parameters, namely block_size and threshold.

The newly allocated entry must be at the start of a block of empty
entries of size block_size and it must be contained in a group of at
least threshold number of free entries. The old behavior of the function
can be achieved by supplying 1 and 0 respectively.

This generalization allows for more sophisticated allocation algorithms
to be build on top of it. For example with block_size an algorithm
can space out certain entries to leave room between them for following
allocations, which can achieve a better localization of entries that
belong together and are likely to be accessed at the same time in the
future. threshold on the other hand can be used to exclude groups, where
the search is not likely to find an empty continuous block of size
block_size.

Signed-off-by: Andreas Rohner <andreas.rohner-***@public.gmane.org>
---
fs/nilfs2/alloc.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++----
fs/nilfs2/alloc.h | 18 +++++-
2 files changed, 166 insertions(+), 13 deletions(-)

diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
index 741fd02..fb71d90 100644
--- a/fs/nilfs2/alloc.c
+++ b/fs/nilfs2/alloc.c
@@ -331,18 +331,18 @@ void *nilfs_palloc_block_get_entry(const struct inode *inode, __u64 nr,
}

/**
- * nilfs_palloc_find_available_slot - find available slot in a group
+ * nilfs_palloc_find_available_slot_unaligned - find available slot in a group
* @inode: inode of metadata file using this allocator
* @group: group number
* @target: offset number of an entry in the group (start point)
* @bitmap: bitmap of the group
* @bsize: size in bits
*/
-static int nilfs_palloc_find_available_slot(struct inode *inode,
- unsigned long group,
- unsigned long target,
- unsigned char *bitmap,
- int bsize)
+static int nilfs_palloc_find_available_slot_unaligned(struct inode *inode,
+ unsigned long group,
+ unsigned long target,
+ unsigned char *bitmap,
+ int bsize)
{
int curr, pos, end, i;

@@ -381,6 +381,127 @@ static int nilfs_palloc_find_available_slot(struct inode *inode,
}

/**
+ * nilfs_palloc_find_available_slot_align32 - find available slot in a group
+ * @inode: inode of metadata file using this allocator
+ * @group: group number
+ * @target: offset number of an entry in the group (start point)
+ * @bitmap: bitmap of the group
+ * @bsize: size in bits
+ *
+ * Description: Finds an available aligned slot in a group. It will be
+ * aligned to 32 slots followed by 31 empty slots.
+ *
+ * Return Value: On success, the available slot is returned.
+ * On error, %-ENOSPC is returned.
+ */
+static int nilfs_palloc_find_available_slot_align32(struct inode *inode,
+ unsigned long group,
+ unsigned long target,
+ unsigned char *bitmap,
+ int bsize)
+{
+ u32 *end = (u32 *)bitmap + bsize / 32;
+ u32 *p = (u32 *)bitmap + target / 32;
+ int i, pos = target & ~31;
+
+ for (i = 0; i < bsize; i += 32, pos += 32, ++p) {
+ /* wrap around */
+ if (p == end) {
+ p = (u32 *)bitmap;
+ pos = 0;
+ }
+
+ if (!*p && !nilfs_set_bit_atomic(nilfs_mdt_bgl_lock(inode,
+ group), pos, bitmap))
+ return pos;
+ }
+
+ return -ENOSPC;
+}
+
+/**
+ * nilfs_palloc_find_available_slot_align - find available slot in a group
+ * @inode: inode of metadata file using this allocator
+ * @group: group number
+ * @target: offset number of an entry in the group (start point)
+ * @bitmap: bitmap of the group
+ * @bsize: size in bits
+ * @block_size: size of the empty block to allocate the new entry (in bits)
+ *
+ * Description: Finds an available aligned slot in a group. It will be
+ * aligned to @block_size slots followed by @block_size - 1 empty slots.
+ * @block_size must be smaller or equal to BITS_PER_LONG.
+ *
+ * Return Value: On success, the available slot is returned.
+ * On error, %-ENOSPC is returned.
+ */
+static int nilfs_palloc_find_available_slot_align(struct inode *inode,
+ unsigned long group,
+ unsigned long target,
+ unsigned char *bitmap,
+ int bsize, int block_size)
+{
+ unsigned long *end = (unsigned long *)bitmap + bsize / BITS_PER_LONG;
+ unsigned long *p = (unsigned long *)bitmap + target / BITS_PER_LONG;
+ unsigned long tmp, mask = ~(~0UL << block_size);
+ int i, j = target & (BITS_PER_LONG - 1);
+ int pos = target & ~(BITS_PER_LONG - 1);
+
+ for (i = 0; i < bsize; i += BITS_PER_LONG, pos += BITS_PER_LONG, ++p) {
+ /* wrap around */
+ if (p == end) {
+ p = (unsigned long *)bitmap;
+ pos = 0;
+ }
+
+ tmp = *p;
+ if (!tmp && !nilfs_set_bit_atomic(nilfs_mdt_bgl_lock(inode,
+ group), pos, bitmap))
+ return pos;
+
+ if (block_size == BITS_PER_LONG || tmp == ~0UL)
+ continue;
+
+ tmp = nilfs_cpu_to_leul(tmp);
+ for (; j < BITS_PER_LONG; j += block_size) {
+ if (!(tmp & (mask << j)) &&
+ !nilfs_set_bit_atomic(nilfs_mdt_bgl_lock(inode,
+ group), pos + j, bitmap))
+ return pos + j;
+ }
+ j = 0;
+ }
+
+ return -ENOSPC;
+}
+
+/**
+ * nilfs_palloc_find_available_slot - find available slot in a group
+ * @inode: inode of metadata file using this allocator
+ * @group: group number
+ * @target: offset number of an entry in the group (start point)
+ * @bitmap: bitmap of the group
+ * @bsize: size in bits
+ * @block_size: size of the empty block to allocate the new entry (in bits)
+ */
+static int nilfs_palloc_find_available_slot(struct inode *inode,
+ unsigned long group,
+ unsigned long target,
+ unsigned char *bitmap,
+ int bsize, int block_size)
+{
+ if (block_size == 1)
+ return nilfs_palloc_find_available_slot_unaligned(inode, group,
+ target, bitmap, bsize);
+ else if (block_size == 32)
+ return nilfs_palloc_find_available_slot_align32(inode, group,
+ target, bitmap, bsize);
+ else
+ return nilfs_palloc_find_available_slot_align(inode, group,
+ target, bitmap, bsize, block_size);
+}
+
+/**
* nilfs_palloc_rest_groups_in_desc_block - get the remaining number of groups
* in a group descriptor block
* @inode: inode of metadata file using this allocator
@@ -461,12 +582,29 @@ int nilfs_palloc_count_max_entries(struct inode *inode, u64 nused, u64 *nmaxp)
}

/**
- * nilfs_palloc_prepare_alloc_entry - prepare to allocate a persistent object
+ * __nilfs_palloc_prepare_alloc_entry - prepare alloc. of a persistent object
* @inode: inode of metadata file using this allocator
* @req: nilfs_palloc_req structure exchanged for the allocation
+ * @block_size: size of the empty block to allocate the new entry
+ * @threshold: allocate only in groups with more free entries than @threshold
+ *
+ * Description: Prepare the allocation of a persistent object. Groups with
+ * less or equal than @threshold free entries are skipped. The new entry is
+ * allocated at the start of an empty block of @block_size slots. @block_size
+ * must be less or equal to BITS_PER_LONG.
+ *
+ * Return Value: On success, 0 is returned and @req is correctly initialized.
+ * On error, one of the following negative error codes is returned.
+ *
+ * %-EIO - I/O error.
+ *
+ * %-ENOMEM - Insufficient amount of memory available.
+ *
+ * %-ENOSPC - No inode left.
*/
-int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
- struct nilfs_palloc_req *req)
+int __nilfs_palloc_prepare_alloc_entry(struct inode *inode,
+ struct nilfs_palloc_req *req,
+ int block_size, unsigned long threshold)
{
struct buffer_head *desc_bh, *bitmap_bh;
struct nilfs_palloc_group_desc *desc;
@@ -478,6 +616,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
unsigned long i, j;
int pos, ret;

+ block_size = (block_size > BITS_PER_LONG) ? BITS_PER_LONG : block_size;
ngroups = nilfs_palloc_groups_count(inode);
maxgroup = ngroups - 1;
group = nilfs_palloc_group(inode, req->pr_entry_nr, &group_offset);
@@ -501,7 +640,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
maxgroup);
for (j = 0; j < n; j++, desc++, group++) {
if (nilfs_palloc_group_desc_nfrees(inode, group, desc)
- > 0) {
+ > threshold) {
ret = nilfs_palloc_get_bitmap_block(
inode, group, 1, &bitmap_bh);
if (ret < 0)
@@ -510,7 +649,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
bitmap = bitmap_kaddr + bh_offset(bitmap_bh);
pos = nilfs_palloc_find_available_slot(
inode, group, group_offset, bitmap,
- entries_per_group);
+ entries_per_group, block_size);
if (pos >= 0) {
/* found a free entry */
nilfs_palloc_group_desc_add_entries(
diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h
index 4bd6451..1dc8557 100644
--- a/fs/nilfs2/alloc.h
+++ b/fs/nilfs2/alloc.h
@@ -64,8 +64,16 @@ struct nilfs_palloc_req {
struct buffer_head *pr_entry_bh;
};

-int nilfs_palloc_prepare_alloc_entry(struct inode *,
- struct nilfs_palloc_req *);
+
+int __nilfs_palloc_prepare_alloc_entry(struct inode *,
+ struct nilfs_palloc_req *,
+ int, unsigned long);
+static inline int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
+ struct nilfs_palloc_req *req)
+{
+ return __nilfs_palloc_prepare_alloc_entry(inode, req, 1, 0);
+}
+
void nilfs_palloc_commit_alloc_entry(struct inode *,
struct nilfs_palloc_req *);
void nilfs_palloc_abort_alloc_entry(struct inode *, struct nilfs_palloc_req *);
@@ -77,6 +85,12 @@ int nilfs_palloc_freev(struct inode *, __u64 *, size_t);
#define nilfs_set_bit_atomic ext2_set_bit_atomic
#define nilfs_clear_bit_atomic ext2_clear_bit_atomic
#define nilfs_find_next_zero_bit find_next_zero_bit_le
+#ifdef __BIG_ENDIAN
+#define nilfs_cpu_to_leul ext2_swab
+#endif
+#ifndef nilfs_cpu_to_leul
+#define nilfs_cpu_to_leul(v) (v)
+#endif

/**
* struct nilfs_bh_assoc - block offset and buffer head association
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-10-13 14:52:59 UTC
Permalink
Hi,
Post by Andreas Rohner
Hi,
The algorithm simply makes sure, that after a directory inode there are
a certain number of free slots available and the search for file inodes
is started at their parent directory.
I havn't had the time yet to do a full scale performance test of it, but
my simple preliminary tests have shown, that the allocation of inodes
takes a little bit longer and the lookup is a little bit faster. My
simple test just creates 1500 directories and after that creates 10
files in each directory.
So more testing is definetly necessary, but I wanted to get some
feedback about the design first. Is my code a step in the right
direction?
Best regards,
Andreas Rohner
nilfs2: support the allocation of whole blocks of meta data entries
nilfs2: improve inode allocation algorithm
fs/nilfs2/alloc.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++----
fs/nilfs2/alloc.h | 18 +++++-
fs/nilfs2/ifile.c | 63 ++++++++++++++++++--
fs/nilfs2/ifile.h | 6 +-
fs/nilfs2/inode.c | 6 +-
fs/nilfs2/segment.c | 5 +-
6 files changed, 235 insertions(+), 24 deletions(-)
I don't know whether this patchset is going in the right direction.
.. we should first measure how the original naive allocator is bad in
comparison with an elaborately designed allocator like this. But, I
will add some comments anyway:

1) You must not use sizeof(struct nilfs_inode) to get inode size.
The size of on-disk inodes is variable and you have to use
NILFS_MDT(ifile)->mi_entry_size to ensure compatibility.
To get ipb (= number of inodes per block), you should use
NILFS_MDT(ifile)->mi_entries_per_block.
Please remove nilfs_ifile_inodes_per_block(). It's redundant.

2) __nilfs_palloc_prepare_alloc_entry()
The argument block_size is so confusing. Usually we use it
for the size of disk block.
Please use a proper word "alignment_size" or so.

3) nilfs_palloc_find_available_slot_align32()
This function seems to be violating endian compatibility.
The order of two 32-bit words in a 64-bit word in little endian
architectures differs from that of big endian architectures.

Having three different implementations looks too overkill to me at
this time. It should be removed unless it will make a significant
difference.

4) nilfs_cpu_to_leul()
Adding this macro is not preferable. It depends on endian.
Did you look for a generic macro which does the same thing ?

Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-10-13 15:34:56 UTC
Permalink
Post by Ryusuke Konishi
Hi,
Post by Andreas Rohner
Hi,
The algorithm simply makes sure, that after a directory inode there are
a certain number of free slots available and the search for file inodes
is started at their parent directory.
I havn't had the time yet to do a full scale performance test of it, but
my simple preliminary tests have shown, that the allocation of inodes
takes a little bit longer and the lookup is a little bit faster. My
simple test just creates 1500 directories and after that creates 10
files in each directory.
So more testing is definetly necessary, but I wanted to get some
feedback about the design first. Is my code a step in the right
direction?
Best regards,
Andreas Rohner
nilfs2: support the allocation of whole blocks of meta data entries
nilfs2: improve inode allocation algorithm
fs/nilfs2/alloc.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++----
fs/nilfs2/alloc.h | 18 +++++-
fs/nilfs2/ifile.c | 63 ++++++++++++++++++--
fs/nilfs2/ifile.h | 6 +-
fs/nilfs2/inode.c | 6 +-
fs/nilfs2/segment.c | 5 +-
6 files changed, 235 insertions(+), 24 deletions(-)
I don't know whether this patchset is going in the right direction.
.. we should first measure how the original naive allocator is bad in
comparison with an elaborately designed allocator like this. But, I
1) You must not use sizeof(struct nilfs_inode) to get inode size.
The size of on-disk inodes is variable and you have to use
NILFS_MDT(ifile)->mi_entry_size to ensure compatibility.
To get ipb (= number of inodes per block), you should use
NILFS_MDT(ifile)->mi_entries_per_block.
Please remove nilfs_ifile_inodes_per_block(). It's redundant.
2) __nilfs_palloc_prepare_alloc_entry()
The argument block_size is so confusing. Usually we use it
for the size of disk block.
Please use a proper word "alignment_size" or so.
3) nilfs_palloc_find_available_slot_align32()
This function seems to be violating endian compatibility.
The order of two 32-bit words in a 64-bit word in little endian
architectures differs from that of big endian architectures.
Sorry, the implementation looks correct (I misread earlier).
Ignore this one.

Regards,
Ryusuke Konishi
Post by Ryusuke Konishi
Having three different implementations looks too overkill to me at
this time. It should be removed unless it will make a significant
difference.
4) nilfs_cpu_to_leul()
Adding this macro is not preferable. It depends on endian.
Did you look for a generic macro which does the same thing ?
Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-10-13 16:18:34 UTC
Permalink
Post by Ryusuke Konishi
Hi,
Post by Andreas Rohner
Hi,
The algorithm simply makes sure, that after a directory inode there are
a certain number of free slots available and the search for file inodes
is started at their parent directory.
I havn't had the time yet to do a full scale performance test of it, but
my simple preliminary tests have shown, that the allocation of inodes
takes a little bit longer and the lookup is a little bit faster. My
simple test just creates 1500 directories and after that creates 10
files in each directory.
So more testing is definetly necessary, but I wanted to get some
feedback about the design first. Is my code a step in the right
direction?
Best regards,
Andreas Rohner
nilfs2: support the allocation of whole blocks of meta data entries
nilfs2: improve inode allocation algorithm
fs/nilfs2/alloc.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++----
fs/nilfs2/alloc.h | 18 +++++-
fs/nilfs2/ifile.c | 63 ++++++++++++++++++--
fs/nilfs2/ifile.h | 6 +-
fs/nilfs2/inode.c | 6 +-
fs/nilfs2/segment.c | 5 +-
6 files changed, 235 insertions(+), 24 deletions(-)
I don't know whether this patchset is going in the right direction.
.. we should first measure how the original naive allocator is bad in
comparison with an elaborately designed allocator like this. But, I
I think the alignment creates a lot of overhead, because every directory
uses up a whole block in the ifile. I could also create a simpler patch
that only stores the last allocated inode number in struct
nilfs_ifile_info and starts the search from there for the next
allocation. Then I can test the three versions against each other in a
large scale test.
Post by Ryusuke Konishi
1) You must not use sizeof(struct nilfs_inode) to get inode size.
The size of on-disk inodes is variable and you have to use
NILFS_MDT(ifile)->mi_entry_size to ensure compatibility.
To get ipb (= number of inodes per block), you should use
NILFS_MDT(ifile)->mi_entries_per_block.
Please remove nilfs_ifile_inodes_per_block(). It's redundant.
Agreed.
Post by Ryusuke Konishi
2) __nilfs_palloc_prepare_alloc_entry()
The argument block_size is so confusing. Usually we use it
for the size of disk block.
Please use a proper word "alignment_size" or so.
Yes that's true "alignment_size" sounds better.
Post by Ryusuke Konishi
3) nilfs_palloc_find_available_slot_align32()
This function seems to be violating endian compatibility.
The order of two 32-bit words in a 64-bit word in little endian
architectures differs from that of big endian architectures.
Having three different implementations looks too overkill to me at
this time. It should be removed unless it will make a significant
difference.
32 is the most common case (4096 block size and 128 inode size), so I
thought it makes sense to optimize for it. But it is not necessary and
it shouldn't make a big difference.
Post by Ryusuke Konishi
4) nilfs_cpu_to_leul()
Adding this macro is not preferable. It depends on endian.
Did you look for a generic macro which does the same thing ?
There are only macros for specific bit lengths, as far as I know. But
unsigned long varies for 32bit and 64bit systems. You could also
implement it like this:

#if BITS_PER_LONG == 64
#define nilfs_cpu_to_leul cpu_to_le64
#elif BITS_PER_LONG == 32
#define nilfs_cpu_to_leul cpu_to_le32
#else
#error BITS_PER_LONG not defined
#endif

Best regards,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...