Discussion:
[PATCH 0/1] improve inode allocation
Andreas Rohner
2014-10-19 17:17:10 UTC
Permalink
Hi,

The following patch is a very simplified version of the the one I sent
in a week ago. My benchmarks showed, that the aligned allocation of
directories create too much overhead.

I used a very simple C program that creates millions of inodes as a
benchmark. I uploaded the source code to github:

https://github.com/zeitgeist87/inodetest

Here are the results:

1.) One process, 20 million inodes:
a) Normal Nilfs
$ time inodetest 1000 1000 20
real 3m50.431s
user 0m5.647s
sys 2m50.760s

$ time find ./ > /dev/null
real 5m49.021s
user 0m27.917s
sys 1m14.197s
b) Improved Nilfs
$ time inodetest 1000 1000 20
real 2m31.857s
user 0m5.950s
sys 1m29.707s

$ time find ./ > /dev/null
real 5m49.060s
user 0m27.787s
sys 1m13.673s
2.) Three processes in parallel, total of 60 million inodes
a) Normal Nilfs
$ time inodetest 1000 1000 20 &
$ time inodetest 1000 1000 20 &
$ time inodetest 1000 1000 20 &
real 20m21.914s
user 0m5.603s
sys 17m43.987s

$ time find ./ > /dev/null
real 28m10.340s
user 1m38.477s
sys 5m9.133s
b) Improved Nilfs
$ time inodetest 1000 1000 20 &
$ time inodetest 1000 1000 20 &
$ time inodetest 1000 1000 20 &
real 6m21.609s
user 0m5.970s
sys 3m8.100s

$ time find ./ > /dev/null
real 30m35.320s
user 1m40.577s
sys 5m14.580s

There is a significant improvement in runtime for both the single and
multiple process case. It is also notable, that the improved version
scales much better for parallel processes.

"find ./ > /dev/null" is virtually identical for the benchmark 1.a and
1.b, but 2.b is consitently slower by 2 minutes, which I cannot
currently explain.

I repeated the benchmarks several times and there were only tiny
variations in the results.

Best regards
Andreas Rohner

Andreas Rohner (1):
nilfs2: improve inode allocation

fs/nilfs2/ifile.c | 31 +++++++++++++++++++++++++++++--
fs/nilfs2/ifile.h | 1 +
fs/nilfs2/segment.c | 5 ++++-
3 files changed, 34 insertions(+), 3 deletions(-)
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-10-19 17:17:11 UTC
Permalink
The current inode allocation algorithm of NILFS2 does not use any
information about the previous allocation. It simply searches for a free
entry in the ifile, always starting from position 0. This patch
introduces an improved allocation scheme.

Inodes are allocated sequentially within the ifile. The current
algorithm always starts the search for a free slot at position 0,
because it has to find possible freed up slots of previously deleted
inodes. This minimizes wasted space, but has a certain cost attached to
it.

This patch introduces the field next_inode in the nilfs_ifile_info
structure, which stores the location of the most likely next free slot.
Whenever an inode is created or deleted next_inode is updated
accordingly. If an inode is deleted next_inode points to the newly
available slot. If an inode is created next_inode points to the slot
after that. Instead of starting every search for a free slot at 0, it is
started at next_inode. This way the search space is narrowed
considerably and a lot of overhead can be avoided.

Because of performance reasons the updates to next_inode are not
protected by locks. So race conditions, non-atomic updates and lost
updates are possible. This can lead to some empty slots that are
overlooked and therefore to some wasted space. But this is only
temporary, because next_inode is periodically reset to 0, to force a
full search starting from position 0.

Signed-off-by: Andreas Rohner <andreas.rohner-***@public.gmane.org>
---
fs/nilfs2/ifile.c | 31 +++++++++++++++++++++++++++++--
fs/nilfs2/ifile.h | 1 +
fs/nilfs2/segment.c | 5 ++++-
3 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c
index 6548c78..0f27e66 100644
--- a/fs/nilfs2/ifile.c
+++ b/fs/nilfs2/ifile.c
@@ -33,10 +33,12 @@
* struct nilfs_ifile_info - on-memory private data of ifile
* @mi: on-memory private data of metadata file
* @palloc_cache: persistent object allocator cache of ifile
+ * @next_inode: ino of the next likely free entry
*/
struct nilfs_ifile_info {
struct nilfs_mdt_info mi;
struct nilfs_palloc_cache palloc_cache;
+ __u64 next_inode;
};

static inline struct nilfs_ifile_info *NILFS_IFILE_I(struct inode *ifile)
@@ -45,6 +47,26 @@ static inline struct nilfs_ifile_info *NILFS_IFILE_I(struct inode *ifile)
}

/**
+ * nilfs_ifile_next_inode_reset - set last_dir to 0
+ * @ifile: ifile inode
+ *
+ * Description: The value of next_inode will increase with every new
+ * allocation of a inode, because it is used as the starting point of the
+ * search for a free entry in the ifile. It should be reset periodically to 0
+ * (e.g.: every segctor timeout), so that previously deleted entries can be
+ * found.
+ */
+void nilfs_ifile_next_inode_reset(struct inode *ifile)
+{
+ /*
+ * possible race condition/non atomic update
+ * next_inode is just a hint for the next allocation so
+ * the possible invalid values are not really harmful
+ */
+ NILFS_IFILE_I(ifile)->next_inode = 0;
+}
+
+/**
* nilfs_ifile_create_inode - create a new disk inode
* @ifile: ifile inode
* @out_ino: pointer to a variable to store inode number
@@ -68,9 +90,8 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
struct nilfs_palloc_req req;
int ret;

- req.pr_entry_nr = 0; /* 0 says find free inode from beginning of
- a group. dull code!! */
req.pr_entry_bh = NULL;
+ req.pr_entry_nr = NILFS_IFILE_I(ifile)->next_inode;

ret = nilfs_palloc_prepare_alloc_entry(ifile, &req);
if (!ret) {
@@ -86,6 +107,9 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
nilfs_palloc_commit_alloc_entry(ifile, &req);
mark_buffer_dirty(req.pr_entry_bh);
nilfs_mdt_mark_dirty(ifile);
+
+ /* see comment in nilfs_ifile_next_inode_reset() */
+ NILFS_IFILE_I(ifile)->next_inode = req.pr_entry_nr + 1;
*out_ino = (ino_t)req.pr_entry_nr;
*out_bh = req.pr_entry_bh;
return 0;
@@ -137,6 +161,9 @@ int nilfs_ifile_delete_inode(struct inode *ifile, ino_t ino)

nilfs_palloc_commit_free_entry(ifile, &req);

+ /* see comment in nilfs_ifile_next_inode_reset() */
+ if (NILFS_IFILE_I(ifile)->next_inode > req.pr_entry_nr)
+ NILFS_IFILE_I(ifile)->next_inode = req.pr_entry_nr;
return 0;
}

diff --git a/fs/nilfs2/ifile.h b/fs/nilfs2/ifile.h
index 679674d..36edbcc 100644
--- a/fs/nilfs2/ifile.h
+++ b/fs/nilfs2/ifile.h
@@ -45,6 +45,7 @@ static inline void nilfs_ifile_unmap_inode(struct inode *ifile, ino_t ino,
kunmap(ibh->b_page);
}

+void nilfs_ifile_next_inode_reset(struct inode *);
int nilfs_ifile_create_inode(struct inode *, ino_t *, struct buffer_head **);
int nilfs_ifile_delete_inode(struct inode *, ino_t);
int nilfs_ifile_get_inode_block(struct inode *, ino_t, struct buffer_head **);
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index a1a1916..0777027 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2447,6 +2447,7 @@ static int nilfs_segctor_thread(void *arg)
{
struct nilfs_sc_info *sci = (struct nilfs_sc_info *)arg;
struct the_nilfs *nilfs = sci->sc_super->s_fs_info;
+ struct inode *ifile = sci->sc_root->ifile;
int timeout = 0;

sci->sc_timer.data = (unsigned long)current;
@@ -2510,8 +2511,10 @@ static int nilfs_segctor_thread(void *arg)
timeout = ((sci->sc_state & NILFS_SEGCTOR_COMMIT) &&
time_after_eq(jiffies, sci->sc_timer.expires));

- if (nilfs_sb_dirty(nilfs) && nilfs_sb_need_update(nilfs))
+ if (nilfs_sb_dirty(nilfs) && nilfs_sb_need_update(nilfs)) {
set_nilfs_discontinued(nilfs);
+ nilfs_ifile_next_inode_reset(ifile);
+ }
}
goto loop;
--
2.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-10-21 18:22:04 UTC
Permalink
Hi,

I extended the inode test to delete a certain number of inodes.

https://github.com/zeitgeist87/inodetest

This should make the benchmark a little bit more realistic, but the
results are essentially the same as before. For the following tests
about 10% of the inodes were continuously deleted:

1.) One process, 20 million inodes, 2 million deleted:
a) Normal Nilfs
$ time inodetest 1000 1000 20 100
real 3m49.793s
user 0m6.323s
sys 2m47.947s

$ time find ./ > /dev/null
real 5m21.020s
user 0m25.440s
sys 1m6.633s
b) Improved Nilfs
$ time inodetest 1000 1000 20 100
real 2m35.011s
user 0m6.847s
sys 1m33.093s

$ time find ./ > /dev/null
real 5m18.922s
user 0m25.323s
sys 1m6.877s
2.) Three processes in parallel, 60 million inodes, 6 million deleted
a) Normal Nilfs
$ time inodetest 1000 1000 20 100 &
$ time inodetest 1000 1000 20 100 &
$ time inodetest 1000 1000 20 100 &
real 19m18.135s
user 0m7.973s
sys 16m16.833s

$ time find ./ > /dev/null
real 29m38.577s
user 1m32.763s
sys 4m44.140s
b) Improved Nilfs
$ time inodetest 1000 1000 20 100 &
$ time inodetest 1000 1000 20 100 &
$ time inodetest 1000 1000 20 100 &
real 6m30.458s
user 0m6.697s
sys 3m10.213s

$ time find ./ > /dev/null
real 28m50.304s
user 1m30.133s
sys 4m40.770s


So the performance improved 32% for a single process and 66% for
multiple processes.

All benchmarks were run on an AMD Phenom II X6 1090T processor with 6
cores and 8 GB of RAM.

Best regards
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...