Discussion:
[PATCH v2 0/1] nilfs2: add mount option that reduces super block writes
Andreas Rohner
2014-02-02 16:50:08 UTC
Permalink
Hi,

This is an experimental patch. I am not suggesting to use this as a
default recovery option. I had some time over the weekend to improve my
first version significantly. The primary goal of this patch is to test
how bad a linear scan of all segments really is for performance.

The patch introduces a mount option that allows the user to disable the
periodic overwrite of the super block during normal file system
operation. The super block needs to point to the latest segment, to
allow the file system to recover in case of an unclean shutdown, but
this leads to a lot of writes to this one particular block. This is
usually not a problem, but it can lead to wear leveling problems with
cheap flash based storage devices.

Instead of periodically writing to the super block, this patch only
writes at mount and umount time and performs a linear scan for the
latest segment in case a recovery is necessary.

Here are the test results for some devices:

100GB HDD:
Recovery: 45.042s
Normal Mount: 0.165s

100GB SSD:
Recovery: 0.752s
Normal Mount: 0.059s

16GB SD-Card:
Recovery: 3.833s
Normal Mount: 0.652s

16GB Micro-SD-Card:
Recovery: 4.011s
Normal Mount: 1.104s

8GB USB-Stick:
Recovery: 1.704s
Normal Mount: 0.549s

The HDD is obviously intolerably slow for this task, but still the read
ahead improved its time significantly.

SSDs are really really good for these kind of random read operations. I
measured it three times to be sure. Since I know the addresses of the
blocks in advance, I do a 64 block read ahead so that the I/O queue of
the SSD is always full. That way it can read with almost full bandwidth.

The SD-Cards and the USB-Stick are not particularly fast, but they are
small enough so that the recovery time is tolerable.

Best regards,
Andreas Rohner

---
v2: Add validity checks
Add history of recent segments
Add check of partial segments
Add readahead
Add fast crc checksum replacing ss_pad

Andreas Rohner (1):
nilfs2: add mount option that reduces super block writes

fs/nilfs2/recovery.c | 248 ++++++++++++++++++++++++++++++++++++++++++++++
fs/nilfs2/segbuf.c | 16 ++-
fs/nilfs2/segment.c | 3 +-
fs/nilfs2/segment.h | 1 +
fs/nilfs2/super.c | 10 +-
fs/nilfs2/the_nilfs.c | 3 +
include/linux/nilfs2_fs.h | 6 +-
7 files changed, 281 insertions(+), 6 deletions(-)
--
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-02-02 16:50:09 UTC
Permalink
This patch introduces a mount option bad_ftl that disables the
periodic overwrites of the super block to make the file system better
suitable for bad flash memory with a bad FTL. The super block is only
written at umount time. So if there is a unclean shutdown the file
system needs to be recovered by a linear scan of all segment summary
blocks.

The linear scan is only necessary if the file system wasn't umounted
properly. So the normal mount time is not affected.

Signed-off-by: Andreas Rohner <andreas.rohner-***@public.gmane.org>
---
fs/nilfs2/recovery.c | 248 ++++++++++++++++++++++++++++++++++++++++++++++
fs/nilfs2/segbuf.c | 16 ++-
fs/nilfs2/segment.c | 3 +-
fs/nilfs2/segment.h | 1 +
fs/nilfs2/super.c | 10 +-
fs/nilfs2/the_nilfs.c | 3 +
include/linux/nilfs2_fs.h | 6 +-
7 files changed, 281 insertions(+), 6 deletions(-)

diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c
index ff00a0b..7f9dd39 100644
--- a/fs/nilfs2/recovery.c
+++ b/fs/nilfs2/recovery.c
@@ -55,6 +55,13 @@ struct nilfs_recovery_block {
struct list_head list;
};

+/* work structure log cursor search */
+struct nilfs_seg_history {
+ u64 seq;
+ sector_t seg_start;
+};
+
+#define NILFS_SEG_HISTORY_DEPTH 3

static int nilfs_warn_segment_error(int err)
{
@@ -792,6 +799,247 @@ int nilfs_salvage_orphan_logs(struct the_nilfs *nilfs,
return err;
}

+static inline int nilfs_validate_segment_summary_fast(struct the_nilfs *nilfs,
+ struct nilfs_segment_summary *sum)
+{
+ u32 crc;
+ int crc_size = sizeof(struct nilfs_segment_summary) -
+ (sizeof(sum->ss_datasum) +
+ sizeof(sum->ss_sumsum) +
+ sizeof(sum->ss_sumsum_fast) +
+ sizeof(sum->ss_cno));
+
+ if (le32_to_cpu(sum->ss_magic) != NILFS_SEGSUM_MAGIC
+ || le32_to_cpu(sum->ss_nblocks) == 0
+ || le32_to_cpu(sum->ss_nblocks) >
+ nilfs->ns_blocks_per_segment)
+ return -1;
+
+ crc = crc32_le(nilfs->ns_crc_seed,
+ (unsigned char *)sum + sizeof(sum->ss_datasum) +
+ sizeof(sum->ss_sumsum), crc_size);
+
+ if (le32_to_cpu(sum->ss_sumsum_fast) != crc)
+ return -1;
+
+ return 0;
+}
+
+static inline void nilfs_add_segment_history(struct nilfs_seg_history *history,
+ int hist_len, u64 seq, sector_t seg_start)
+{
+ int i, j;
+
+ for (i = 0; i < hist_len; ++i) {
+ if (seq > history[i].seq) {
+ for (j = hist_len - 1; j > i; --j)
+ history[j] = history[j - 1];
+
+ history[i].seq = seq;
+ history[i].seg_start = seg_start;
+ break;
+ }
+ }
+}
+
+static inline void nilfs_init_segment_history(struct nilfs_seg_history *history,
+ int hist_len, u64 seq, sector_t seg_start)
+{
+ int i;
+
+ for (i = 0; i < hist_len; ++i) {
+ history[i].seq = seq;
+ history[i].seg_start = seg_start;
+ }
+}
+
+static int nilfs_search_partial_log_cursor(struct the_nilfs *nilfs,
+ u64 seq, sector_t pseg_start, sector_t *dest)
+{
+ struct buffer_head *bh_sum = NULL;
+ struct nilfs_segment_summary *sum;
+ sector_t seg_start, seg_end;
+ int ret = -1;
+
+ nilfs_get_segment_range(nilfs,
+ nilfs_get_segnum_of_block(nilfs, pseg_start),
+ &seg_start, &seg_end);
+
+ while (pseg_start < seg_end && pseg_start >= seg_start) {
+ brelse(bh_sum);
+
+ bh_sum = nilfs_read_log_header(nilfs, pseg_start, &sum);
+ if (!bh_sum)
+ return -EIO;
+
+ if (nilfs_validate_segment_summary_fast(nilfs, sum))
+ goto out;
+
+ if (le64_to_cpu(sum->ss_seq) != seq)
+ goto out;
+
+ if (le16_to_cpu(sum->ss_flags) & NILFS_SS_SR) {
+ *dest = pseg_start;
+ ret = 0;
+ goto out;
+ }
+
+ pseg_start += le32_to_cpu(sum->ss_nblocks);
+ }
+
+out:
+ brelse(bh_sum);
+ return ret;
+}
+
+static int nilfs_search_validate_log_cursor(struct the_nilfs *nilfs,
+ sector_t seg_start, u64 seq)
+{
+ struct buffer_head *bh_sum;
+ struct nilfs_segment_summary *sum;
+ sector_t b;
+ int ret;
+
+ bh_sum = nilfs_read_log_header(nilfs, seg_start, &sum);
+ if (!bh_sum) {
+ printk(KERN_ERR "NILFS error searching for cursor.\n");
+ return -EIO;
+ }
+
+ b = seg_start;
+ while (b < seg_start + le32_to_cpu(sum->ss_nblocks))
+ __breadahead(nilfs->ns_bdev, b++, nilfs->ns_blocksize);
+
+ ret = nilfs_validate_log(nilfs, seq, bh_sum, sum);
+ if (ret) {
+ ret = -1;
+ } else {
+ /* update nilfs log cursor */
+ nilfs->ns_last_pseg = seg_start;
+ nilfs->ns_last_cno = le64_to_cpu(sum->ss_cno);
+ nilfs->ns_last_seq = seq;
+
+ nilfs->ns_prev_seq = nilfs->ns_last_seq;
+ nilfs->ns_seg_seq = nilfs->ns_last_seq;
+ nilfs->ns_segnum =
+ nilfs_get_segnum_of_block(nilfs, nilfs->ns_last_pseg);
+ nilfs->ns_cno = nilfs->ns_last_cno + 1;
+ }
+
+ brelse(bh_sum);
+ return ret;
+}
+
+/**
+ * nilfs_search_log_cursor - search the latest log cursor
+ * @nilfs: the_nilfs
+ *
+ * Description: nilfs_search_log_cursor() performs a linear scan of all full
+ * segment summary blocks and updates the cursor of the nilfs object if a more
+ * recent segment is found. The cursor is only updated if the segment is valid
+ * and there is a super root present. The goal is to quickly find the latest
+ * segment and leave the rest of the heavy lifting to the normal recovery
+ * process.
+ *
+ * Return Value: On success, 0 is returned. On error, one of the following
+ * negative error code is returned.
+ *
+ * %-EIO - I/O error
+ */
+int nilfs_search_log_cursor(struct the_nilfs *nilfs)
+{
+ u64 seq, segnum, segahead, nsegments = nilfs->ns_nsegments;
+ struct buffer_head *bh_sum = NULL;
+ struct nilfs_segment_summary *sum;
+ struct nilfs_seg_history history[NILFS_SEG_HISTORY_DEPTH];
+ struct nilfs_seg_history history_sr[NILFS_SEG_HISTORY_DEPTH];
+ sector_t seg_start = 0, seg_end;
+ int i;
+
+ printk(KERN_WARNING "NILFS warning: searching for latest log\n");
+
+ for (segahead = 0; segahead < 64 && segahead < nsegments; ++segahead) {
+ nilfs_get_segment_range(nilfs, segahead, &seg_start, &seg_end);
+ __breadahead(nilfs->ns_bdev, seg_start, nilfs->ns_blocksize);
+ }
+
+ nilfs_init_segment_history(history, NILFS_SEG_HISTORY_DEPTH,
+ nilfs->ns_last_seq, 0);
+ nilfs_init_segment_history(history_sr, NILFS_SEG_HISTORY_DEPTH,
+ nilfs->ns_last_seq, 0);
+
+ for (segnum = 0; segnum < nsegments; ++segnum, ++segahead) {
+ brelse(bh_sum);
+
+ if (segahead < nsegments) {
+ nilfs_get_segment_range(nilfs, segahead,
+ &seg_start, &seg_end);
+ __breadahead(nilfs->ns_bdev, seg_start,
+ nilfs->ns_blocksize);
+ }
+
+ nilfs_get_segment_range(nilfs, segnum, &seg_start, &seg_end);
+
+ bh_sum = nilfs_read_log_header(nilfs, seg_start, &sum);
+ if (!bh_sum) {
+ printk(KERN_ERR "NILFS error searching for cursor.\n");
+ return -EIO;
+ }
+
+ if (nilfs_validate_segment_summary_fast(nilfs, sum))
+ continue;
+
+ seq = le64_to_cpu(sum->ss_seq);
+
+ nilfs_add_segment_history(history, NILFS_SEG_HISTORY_DEPTH,
+ seq, seg_start);
+
+ if (!(le16_to_cpu(sum->ss_flags) & NILFS_SS_SR))
+ continue;
+
+ nilfs_add_segment_history(history_sr, NILFS_SEG_HISTORY_DEPTH,
+ seq, seg_start);
+ }
+ brelse(bh_sum);
+
+ /*
+ * if last super root is too far off try to find
+ * next super root in partial segment
+ */
+ if (history_sr[0].seq + NILFS_SEG_HISTORY_DEPTH < history[0].seq) {
+ for (i = 0; i < NILFS_SEG_HISTORY_DEPTH; ++i) {
+ if (history[i].seg_start == 0 ||
+ history[i].seq <= nilfs->ns_last_seq)
+ break;
+
+ if (nilfs_search_partial_log_cursor(nilfs,
+ history[i].seq, history[i].seg_start,
+ &seg_start) == 0) {
+ nilfs_add_segment_history(history_sr,
+ NILFS_SEG_HISTORY_DEPTH,
+ history[i].seq, seg_start);
+ break;
+ }
+ }
+ }
+
+ /*
+ * try to validate one of the super root segments previously
+ * collected
+ */
+ for (i = 0; i < NILFS_SEG_HISTORY_DEPTH; ++i) {
+ if (history_sr[i].seg_start == 0 ||
+ history_sr[i].seq <= nilfs->ns_last_seq)
+ break;
+
+ if (nilfs_search_validate_log_cursor(nilfs,
+ history_sr[i].seg_start, history_sr[i].seq) == 0)
+ return 0;
+ }
+
+ return -1;
+}
+
/**
* nilfs_search_super_root - search the latest valid super root
* @nilfs: the_nilfs
diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index dc3a9efd..692bf26 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -158,6 +158,9 @@ void nilfs_segbuf_fill_in_segsum(struct nilfs_segment_buffer *segbuf)
{
struct nilfs_segment_summary *raw_sum;
struct buffer_head *bh_sum;
+ struct the_nilfs *nilfs = segbuf->sb_super->s_fs_info;
+ u32 crc;
+ int size;

bh_sum = list_entry(segbuf->sb_segsum_buffers.next,
struct buffer_head, b_assoc_buffers);
@@ -172,8 +175,19 @@ void nilfs_segbuf_fill_in_segsum(struct nilfs_segment_buffer *segbuf)
raw_sum->ss_nblocks = cpu_to_le32(segbuf->sb_sum.nblocks);
raw_sum->ss_nfinfo = cpu_to_le32(segbuf->sb_sum.nfinfo);
raw_sum->ss_sumbytes = cpu_to_le32(segbuf->sb_sum.sumbytes);
- raw_sum->ss_pad = 0;
raw_sum->ss_cno = cpu_to_le64(segbuf->sb_sum.cno);
+
+ size = sizeof(struct nilfs_segment_summary) -
+ (sizeof(raw_sum->ss_datasum) +
+ sizeof(raw_sum->ss_sumsum) +
+ sizeof(raw_sum->ss_sumsum_fast) +
+ sizeof(raw_sum->ss_cno));
+
+ crc = crc32_le(nilfs->ns_crc_seed,
+ (unsigned char *)raw_sum + sizeof(raw_sum->ss_datasum) +
+ sizeof(raw_sum->ss_sumsum), size);
+
+ raw_sum->ss_sumsum_fast = cpu_to_le32(crc);
}

/*
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index a1a1916..e8e38a9 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2288,7 +2288,8 @@ static int nilfs_segctor_construct(struct nilfs_sc_info *sci, int mode)
if (mode != SC_FLUSH_DAT)
atomic_set(&nilfs->ns_ndirtyblks, 0);
if (test_bit(NILFS_SC_SUPER_ROOT, &sci->sc_flags) &&
- nilfs_discontinued(nilfs)) {
+ nilfs_discontinued(nilfs) &&
+ !nilfs_test_opt(nilfs, BAD_FTL)) {
down_write(&nilfs->ns_sem);
err = -EIO;
sbp = nilfs_prepare_super(sci->sc_super,
diff --git a/fs/nilfs2/segment.h b/fs/nilfs2/segment.h
index 38a1d00..ceb0ea4 100644
--- a/fs/nilfs2/segment.h
+++ b/fs/nilfs2/segment.h
@@ -237,6 +237,7 @@ void nilfs_detach_log_writer(struct super_block *sb);
/* recovery.c */
extern int nilfs_read_super_root_block(struct the_nilfs *, sector_t,
struct buffer_head **, int);
+extern int nilfs_search_log_cursor(struct the_nilfs *nilfs);
extern int nilfs_search_super_root(struct the_nilfs *,
struct nilfs_recovery_info *);
int nilfs_salvage_orphan_logs(struct the_nilfs *nilfs, struct super_block *sb,
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 7ac2a12..c3374ed 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -505,7 +505,7 @@ static int nilfs_sync_fs(struct super_block *sb, int wait)
err = nilfs_construct_segment(sb);

down_write(&nilfs->ns_sem);
- if (nilfs_sb_dirty(nilfs)) {
+ if (nilfs_sb_dirty(nilfs) && !nilfs_test_opt(nilfs, BAD_FTL)) {
sbp = nilfs_prepare_super(sb, nilfs_sb_will_flip(nilfs));
if (likely(sbp)) {
nilfs_set_log_cursor(sbp[0], nilfs);
@@ -691,6 +691,8 @@ static int nilfs_show_options(struct seq_file *seq, struct dentry *dentry)
seq_puts(seq, ",norecovery");
if (nilfs_test_opt(nilfs, DISCARD))
seq_puts(seq, ",discard");
+ if (nilfs_test_opt(nilfs, BAD_FTL))
+ seq_puts(seq, ",bad_ftl");

return 0;
}
@@ -712,7 +714,7 @@ static const struct super_operations nilfs_sops = {
enum {
Opt_err_cont, Opt_err_panic, Opt_err_ro,
Opt_barrier, Opt_nobarrier, Opt_snapshot, Opt_order, Opt_norecovery,
- Opt_discard, Opt_nodiscard, Opt_err,
+ Opt_discard, Opt_nodiscard, Opt_err, Opt_bad_ftl,
};

static match_table_t tokens = {
@@ -726,6 +728,7 @@ static match_table_t tokens = {
{Opt_norecovery, "norecovery"},
{Opt_discard, "discard"},
{Opt_nodiscard, "nodiscard"},
+ {Opt_bad_ftl, "bad_ftl"},
{Opt_err, NULL}
};

@@ -787,6 +790,9 @@ static int parse_options(char *options, struct super_block *sb, int is_remount)
case Opt_nodiscard:
nilfs_clear_opt(nilfs, DISCARD);
break;
+ case Opt_bad_ftl:
+ nilfs_set_opt(nilfs, BAD_FTL);
+ break;
default:
printk(KERN_ERR
"NILFS: Unrecognized mount option \"%s\"\n", p);
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 94c451c..a44bf40 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -217,6 +217,9 @@ int load_nilfs(struct the_nilfs *nilfs, struct super_block *sb)
int err;

if (!valid_fs) {
+ if (nilfs_test_opt(nilfs, BAD_FTL))
+ nilfs_search_log_cursor(nilfs);
+
printk(KERN_WARNING "NILFS warning: mounting unchecked fs\n");
if (s_flags & MS_RDONLY) {
printk(KERN_INFO "NILFS: INFO: recovery "
diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
index 9875576..03424d4 100644
--- a/include/linux/nilfs2_fs.h
+++ b/include/linux/nilfs2_fs.h
@@ -135,6 +135,8 @@ struct nilfs_super_root {
#define NILFS_MOUNT_NORECOVERY 0x4000 /* Disable write access during
mount-time recovery */
#define NILFS_MOUNT_DISCARD 0x8000 /* Issue DISCARD requests */
+#define NILFS_MOUNT_BAD_FTL 0x10000 /* Only write super block
+ at umount time */


/**
@@ -407,7 +409,7 @@ union nilfs_binfo {
* @ss_nblocks: number of blocks
* @ss_nfinfo: number of finfo structures
* @ss_sumbytes: total size of segment summary in bytes
- * @ss_pad: padding
+ * @ss_sumsum_fast: small sum of only the nilfs_segment_summary
* @ss_cno: checkpoint number
*/
struct nilfs_segment_summary {
@@ -422,7 +424,7 @@ struct nilfs_segment_summary {
__le32 ss_nblocks;
__le32 ss_nfinfo;
__le32 ss_sumbytes;
- __le32 ss_pad;
+ __le32 ss_sumsum_fast;
__le64 ss_cno;
/* array of finfo structures */
};
--
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Clemens Eisserer
2014-02-05 20:21:26 UTC
Permalink
Hi Andreas,

Thanks for improving the initial patch-set.
Because I am on a seminar this week, I'll give the new patch a try as
soon as I have access to my raspberry pi again.

Regards and thanks again, Clemens

PS: The new results on SSDs seem very intriguing :)
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-02-11 12:31:38 UTC
Permalink
Hi Andreas,
Post by Andreas Rohner
This patch introduces a mount option bad_ftl that disables the
periodic overwrites of the super block to make the file system better
suitable for bad flash memory with a bad FTL. The super block is only
written at umount time. So if there is a unclean shutdown the file
system needs to be recovered by a linear scan of all segment summary
blocks.
The linear scan is only necessary if the file system wasn't umounted
properly. So the normal mount time is not affected.
Do we really need to add the third crc in segument summary headers ?
After all, we need to do a full check for a log with a super root
block to validate it.

This patch also seems to be using the nature that headers which have a
NILFS_SS_SR flag sometimes appear at the head of segments. But this
is not guranteed. Is this condition eliminable?

The measurement results are very interesting (thanks for the effort),
but they look to rely on a few these ellipsis techniques for reducing
recovery time.

Regards,
Ryusuke Konishi
Post by Andreas Rohner
---
fs/nilfs2/recovery.c | 248 ++++++++++++++++++++++++++++++++++++++++++++++
fs/nilfs2/segbuf.c | 16 ++-
fs/nilfs2/segment.c | 3 +-
fs/nilfs2/segment.h | 1 +
fs/nilfs2/super.c | 10 +-
fs/nilfs2/the_nilfs.c | 3 +
include/linux/nilfs2_fs.h | 6 +-
7 files changed, 281 insertions(+), 6 deletions(-)
diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c
index ff00a0b..7f9dd39 100644
--- a/fs/nilfs2/recovery.c
+++ b/fs/nilfs2/recovery.c
@@ -55,6 +55,13 @@ struct nilfs_recovery_block {
struct list_head list;
};
+/* work structure log cursor search */
+struct nilfs_seg_history {
+ u64 seq;
+ sector_t seg_start;
+};
+
+#define NILFS_SEG_HISTORY_DEPTH 3
static int nilfs_warn_segment_error(int err)
{
@@ -792,6 +799,247 @@ int nilfs_salvage_orphan_logs(struct the_nilfs *nilfs,
return err;
}
+static inline int nilfs_validate_segment_summary_fast(struct the_nilfs *nilfs,
+ struct nilfs_segment_summary *sum)
+{
+ u32 crc;
+ int crc_size = sizeof(struct nilfs_segment_summary) -
+ (sizeof(sum->ss_datasum) +
+ sizeof(sum->ss_sumsum) +
+ sizeof(sum->ss_sumsum_fast) +
+ sizeof(sum->ss_cno));
+
+ if (le32_to_cpu(sum->ss_magic) != NILFS_SEGSUM_MAGIC
+ || le32_to_cpu(sum->ss_nblocks) == 0
+ || le32_to_cpu(sum->ss_nblocks) >
+ nilfs->ns_blocks_per_segment)
+ return -1;
+
+ crc = crc32_le(nilfs->ns_crc_seed,
+ (unsigned char *)sum + sizeof(sum->ss_datasum) +
+ sizeof(sum->ss_sumsum), crc_size);
+
+ if (le32_to_cpu(sum->ss_sumsum_fast) != crc)
+ return -1;
+
+ return 0;
+}
+static inline void nilfs_add_segment_history(struct nilfs_seg_history *history,
+ int hist_len, u64 seq, sector_t seg_start)
+{
+ int i, j;
+
+ for (i = 0; i < hist_len; ++i) {
+ if (seq > history[i].seq) {
+ for (j = hist_len - 1; j > i; --j)
+ history[j] = history[j - 1];
+
+ history[i].seq = seq;
+ history[i].seg_start = seg_start;
+ break;
+ }
+ }
+}
+
+static inline void nilfs_init_segment_history(struct nilfs_seg_history *history,
+ int hist_len, u64 seq, sector_t seg_start)
+{
+ int i;
+
+ for (i = 0; i < hist_len; ++i) {
+ history[i].seq = seq;
+ history[i].seg_start = seg_start;
+ }
+}
+
+static int nilfs_search_partial_log_cursor(struct the_nilfs *nilfs,
+ u64 seq, sector_t pseg_start, sector_t *dest)
+{
+ struct buffer_head *bh_sum = NULL;
+ struct nilfs_segment_summary *sum;
+ sector_t seg_start, seg_end;
+ int ret = -1;
+
+ nilfs_get_segment_range(nilfs,
+ nilfs_get_segnum_of_block(nilfs, pseg_start),
+ &seg_start, &seg_end);
+
+ while (pseg_start < seg_end && pseg_start >= seg_start) {
+ brelse(bh_sum);
+
+ bh_sum = nilfs_read_log_header(nilfs, pseg_start, &sum);
+ if (!bh_sum)
+ return -EIO;
+
+ if (nilfs_validate_segment_summary_fast(nilfs, sum))
+ goto out;
+
+ if (le64_to_cpu(sum->ss_seq) != seq)
+ goto out;
+
+ if (le16_to_cpu(sum->ss_flags) & NILFS_SS_SR) {
+ *dest = pseg_start;
+ ret = 0;
+ goto out;
+ }
+
+ pseg_start += le32_to_cpu(sum->ss_nblocks);
+ }
+
+ brelse(bh_sum);
+ return ret;
+}
+
+static int nilfs_search_validate_log_cursor(struct the_nilfs *nilfs,
+ sector_t seg_start, u64 seq)
+{
+ struct buffer_head *bh_sum;
+ struct nilfs_segment_summary *sum;
+ sector_t b;
+ int ret;
+
+ bh_sum = nilfs_read_log_header(nilfs, seg_start, &sum);
+ if (!bh_sum) {
+ printk(KERN_ERR "NILFS error searching for cursor.\n");
+ return -EIO;
+ }
+
+ b = seg_start;
+ while (b < seg_start + le32_to_cpu(sum->ss_nblocks))
+ __breadahead(nilfs->ns_bdev, b++, nilfs->ns_blocksize);
+
+ ret = nilfs_validate_log(nilfs, seq, bh_sum, sum);
+ if (ret) {
+ ret = -1;
+ } else {
+ /* update nilfs log cursor */
+ nilfs->ns_last_pseg = seg_start;
+ nilfs->ns_last_cno = le64_to_cpu(sum->ss_cno);
+ nilfs->ns_last_seq = seq;
+
+ nilfs->ns_prev_seq = nilfs->ns_last_seq;
+ nilfs->ns_seg_seq = nilfs->ns_last_seq;
+ nilfs->ns_segnum =
+ nilfs_get_segnum_of_block(nilfs, nilfs->ns_last_pseg);
+ nilfs->ns_cno = nilfs->ns_last_cno + 1;
+ }
+
+ brelse(bh_sum);
+ return ret;
+}
+
+/**
+ * nilfs_search_log_cursor - search the latest log cursor
+ *
+ * Description: nilfs_search_log_cursor() performs a linear scan of all full
+ * segment summary blocks and updates the cursor of the nilfs object if a more
+ * recent segment is found. The cursor is only updated if the segment is valid
+ * and there is a super root present. The goal is to quickly find the latest
+ * segment and leave the rest of the heavy lifting to the normal recovery
+ * process.
+ *
+ * Return Value: On success, 0 is returned. On error, one of the following
+ * negative error code is returned.
+ *
+ * %-EIO - I/O error
+ */
+int nilfs_search_log_cursor(struct the_nilfs *nilfs)
+{
+ u64 seq, segnum, segahead, nsegments = nilfs->ns_nsegments;
+ struct buffer_head *bh_sum = NULL;
+ struct nilfs_segment_summary *sum;
+ struct nilfs_seg_history history[NILFS_SEG_HISTORY_DEPTH];
+ struct nilfs_seg_history history_sr[NILFS_SEG_HISTORY_DEPTH];
+ sector_t seg_start = 0, seg_end;
+ int i;
+
+ printk(KERN_WARNING "NILFS warning: searching for latest log\n");
+
+ for (segahead = 0; segahead < 64 && segahead < nsegments; ++segahead) {
+ nilfs_get_segment_range(nilfs, segahead, &seg_start, &seg_end);
+ __breadahead(nilfs->ns_bdev, seg_start, nilfs->ns_blocksize);
+ }
+
+ nilfs_init_segment_history(history, NILFS_SEG_HISTORY_DEPTH,
+ nilfs->ns_last_seq, 0);
+ nilfs_init_segment_history(history_sr, NILFS_SEG_HISTORY_DEPTH,
+ nilfs->ns_last_seq, 0);
+
+ for (segnum = 0; segnum < nsegments; ++segnum, ++segahead) {
+ brelse(bh_sum);
+
+ if (segahead < nsegments) {
+ nilfs_get_segment_range(nilfs, segahead,
+ &seg_start, &seg_end);
+ __breadahead(nilfs->ns_bdev, seg_start,
+ nilfs->ns_blocksize);
+ }
+
+ nilfs_get_segment_range(nilfs, segnum, &seg_start, &seg_end);
+
+ bh_sum = nilfs_read_log_header(nilfs, seg_start, &sum);
+ if (!bh_sum) {
+ printk(KERN_ERR "NILFS error searching for cursor.\n");
+ return -EIO;
+ }
+
+ if (nilfs_validate_segment_summary_fast(nilfs, sum))
+ continue;
+
+ seq = le64_to_cpu(sum->ss_seq);
+
+ nilfs_add_segment_history(history, NILFS_SEG_HISTORY_DEPTH,
+ seq, seg_start);
+
+ if (!(le16_to_cpu(sum->ss_flags) & NILFS_SS_SR))
+ continue;
+
+ nilfs_add_segment_history(history_sr, NILFS_SEG_HISTORY_DEPTH,
+ seq, seg_start);
+ }
+ brelse(bh_sum);
+
+ /*
+ * if last super root is too far off try to find
+ * next super root in partial segment
+ */
+ if (history_sr[0].seq + NILFS_SEG_HISTORY_DEPTH < history[0].seq) {
+ for (i = 0; i < NILFS_SEG_HISTORY_DEPTH; ++i) {
+ if (history[i].seg_start == 0 ||
+ history[i].seq <= nilfs->ns_last_seq)
+ break;
+
+ if (nilfs_search_partial_log_cursor(nilfs,
+ history[i].seq, history[i].seg_start,
+ &seg_start) == 0) {
+ nilfs_add_segment_history(history_sr,
+ NILFS_SEG_HISTORY_DEPTH,
+ history[i].seq, seg_start);
+ break;
+ }
+ }
+ }
+
+ /*
+ * try to validate one of the super root segments previously
+ * collected
+ */
+ for (i = 0; i < NILFS_SEG_HISTORY_DEPTH; ++i) {
+ if (history_sr[i].seg_start == 0 ||
+ history_sr[i].seq <= nilfs->ns_last_seq)
+ break;
+
+ if (nilfs_search_validate_log_cursor(nilfs,
+ history_sr[i].seg_start, history_sr[i].seq) == 0)
+ return 0;
+ }
+
+ return -1;
+}
+
/**
* nilfs_search_super_root - search the latest valid super root
diff --git a/fs/nilfs2/segbuf.c b/fs/nilfs2/segbuf.c
index dc3a9efd..692bf26 100644
--- a/fs/nilfs2/segbuf.c
+++ b/fs/nilfs2/segbuf.c
@@ -158,6 +158,9 @@ void nilfs_segbuf_fill_in_segsum(struct nilfs_segment_buffer *segbuf)
{
struct nilfs_segment_summary *raw_sum;
struct buffer_head *bh_sum;
+ struct the_nilfs *nilfs = segbuf->sb_super->s_fs_info;
+ u32 crc;
+ int size;
bh_sum = list_entry(segbuf->sb_segsum_buffers.next,
struct buffer_head, b_assoc_buffers);
@@ -172,8 +175,19 @@ void nilfs_segbuf_fill_in_segsum(struct nilfs_segment_buffer *segbuf)
raw_sum->ss_nblocks = cpu_to_le32(segbuf->sb_sum.nblocks);
raw_sum->ss_nfinfo = cpu_to_le32(segbuf->sb_sum.nfinfo);
raw_sum->ss_sumbytes = cpu_to_le32(segbuf->sb_sum.sumbytes);
- raw_sum->ss_pad = 0;
raw_sum->ss_cno = cpu_to_le64(segbuf->sb_sum.cno);
+
+ size = sizeof(struct nilfs_segment_summary) -
+ (sizeof(raw_sum->ss_datasum) +
+ sizeof(raw_sum->ss_sumsum) +
+ sizeof(raw_sum->ss_sumsum_fast) +
+ sizeof(raw_sum->ss_cno));
+
+ crc = crc32_le(nilfs->ns_crc_seed,
+ (unsigned char *)raw_sum + sizeof(raw_sum->ss_datasum) +
+ sizeof(raw_sum->ss_sumsum), size);
+
+ raw_sum->ss_sumsum_fast = cpu_to_le32(crc);
}
/*
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index a1a1916..e8e38a9 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -2288,7 +2288,8 @@ static int nilfs_segctor_construct(struct nilfs_sc_info *sci, int mode)
if (mode != SC_FLUSH_DAT)
atomic_set(&nilfs->ns_ndirtyblks, 0);
if (test_bit(NILFS_SC_SUPER_ROOT, &sci->sc_flags) &&
- nilfs_discontinued(nilfs)) {
+ nilfs_discontinued(nilfs) &&
+ !nilfs_test_opt(nilfs, BAD_FTL)) {
down_write(&nilfs->ns_sem);
err = -EIO;
sbp = nilfs_prepare_super(sci->sc_super,
diff --git a/fs/nilfs2/segment.h b/fs/nilfs2/segment.h
index 38a1d00..ceb0ea4 100644
--- a/fs/nilfs2/segment.h
+++ b/fs/nilfs2/segment.h
@@ -237,6 +237,7 @@ void nilfs_detach_log_writer(struct super_block *sb);
/* recovery.c */
extern int nilfs_read_super_root_block(struct the_nilfs *, sector_t,
struct buffer_head **, int);
+extern int nilfs_search_log_cursor(struct the_nilfs *nilfs);
extern int nilfs_search_super_root(struct the_nilfs *,
struct nilfs_recovery_info *);
int nilfs_salvage_orphan_logs(struct the_nilfs *nilfs, struct super_block *sb,
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 7ac2a12..c3374ed 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -505,7 +505,7 @@ static int nilfs_sync_fs(struct super_block *sb, int wait)
err = nilfs_construct_segment(sb);
down_write(&nilfs->ns_sem);
- if (nilfs_sb_dirty(nilfs)) {
+ if (nilfs_sb_dirty(nilfs) && !nilfs_test_opt(nilfs, BAD_FTL)) {
sbp = nilfs_prepare_super(sb, nilfs_sb_will_flip(nilfs));
if (likely(sbp)) {
nilfs_set_log_cursor(sbp[0], nilfs);
@@ -691,6 +691,8 @@ static int nilfs_show_options(struct seq_file *seq, struct dentry *dentry)
seq_puts(seq, ",norecovery");
if (nilfs_test_opt(nilfs, DISCARD))
seq_puts(seq, ",discard");
+ if (nilfs_test_opt(nilfs, BAD_FTL))
+ seq_puts(seq, ",bad_ftl");
return 0;
}
@@ -712,7 +714,7 @@ static const struct super_operations nilfs_sops = {
enum {
Opt_err_cont, Opt_err_panic, Opt_err_ro,
Opt_barrier, Opt_nobarrier, Opt_snapshot, Opt_order, Opt_norecovery,
- Opt_discard, Opt_nodiscard, Opt_err,
+ Opt_discard, Opt_nodiscard, Opt_err, Opt_bad_ftl,
};
static match_table_t tokens = {
@@ -726,6 +728,7 @@ static match_table_t tokens = {
{Opt_norecovery, "norecovery"},
{Opt_discard, "discard"},
{Opt_nodiscard, "nodiscard"},
+ {Opt_bad_ftl, "bad_ftl"},
{Opt_err, NULL}
};
@@ -787,6 +790,9 @@ static int parse_options(char *options, struct super_block *sb, int is_remount)
nilfs_clear_opt(nilfs, DISCARD);
break;
+ nilfs_set_opt(nilfs, BAD_FTL);
+ break;
printk(KERN_ERR
"NILFS: Unrecognized mount option \"%s\"\n", p);
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 94c451c..a44bf40 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -217,6 +217,9 @@ int load_nilfs(struct the_nilfs *nilfs, struct super_block *sb)
int err;
if (!valid_fs) {
+ if (nilfs_test_opt(nilfs, BAD_FTL))
+ nilfs_search_log_cursor(nilfs);
+
printk(KERN_WARNING "NILFS warning: mounting unchecked fs\n");
if (s_flags & MS_RDONLY) {
printk(KERN_INFO "NILFS: INFO: recovery "
diff --git a/include/linux/nilfs2_fs.h b/include/linux/nilfs2_fs.h
index 9875576..03424d4 100644
--- a/include/linux/nilfs2_fs.h
+++ b/include/linux/nilfs2_fs.h
@@ -135,6 +135,8 @@ struct nilfs_super_root {
#define NILFS_MOUNT_NORECOVERY 0x4000 /* Disable write access during
mount-time recovery */
#define NILFS_MOUNT_DISCARD 0x8000 /* Issue DISCARD requests */
+#define NILFS_MOUNT_BAD_FTL 0x10000 /* Only write super block
+ at umount time */
/**
@@ -407,7 +409,7 @@ union nilfs_binfo {
*/
struct nilfs_segment_summary {
@@ -422,7 +424,7 @@ struct nilfs_segment_summary {
__le32 ss_nblocks;
__le32 ss_nfinfo;
__le32 ss_sumbytes;
- __le32 ss_pad;
+ __le32 ss_sumsum_fast;
__le64 ss_cno;
/* array of finfo structures */
};
--
1.8.5.3
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-02-11 14:07:48 UTC
Permalink
Hi Ryusuke,
Post by Clemens Eisserer
Hi Andreas,
Post by Andreas Rohner
This patch introduces a mount option bad_ftl that disables the
periodic overwrites of the super block to make the file system better
suitable for bad flash memory with a bad FTL. The super block is only
written at umount time. So if there is a unclean shutdown the file
system needs to be recovered by a linear scan of all segment summary
blocks.
The linear scan is only necessary if the file system wasn't umounted
properly. So the normal mount time is not affected.
Do we really need to add the third crc in segument summary headers ?
After all, we need to do a full check for a log with a super root
block to validate it.
I need a way to quickly decide if a segment could be potentially valid
without reading in more blocks. The third crc is there, to make sure,
that the segment is not a valid segment of a previous instance of NILFS2
on the same volume. Such a previous instance would have used a different
crc seed. I only keep a limited number of history entries. This history
could be easily filled up with old segments from a previous instance and
the recovery would fail.

I tried to use the ss_sumsum crc for that purpose, but for that I have
to read in on average 5 to 8 extra blocks per segment. I cannot read
ahead these blocks, so the whole search is slowed down.
Post by Clemens Eisserer
This patch also seems to be using the nature that headers which have a
NILFS_SS_SR flag sometimes appear at the head of segments. But this
is not guranteed. Is this condition eliminable?
It uses that fact, but it does not rely on it. If there is a recent
segment with NILFS_SS_SR flag at the top it will use that and leave the
rest to the normal recovery function. But if none is found, it will scan
all partial segments for the NILFS_SS_SR flag. This is done in
nilfs_search_partial_log_cursor.
Post by Clemens Eisserer
The measurement results are very interesting (thanks for the effort),
but they look to rely on a few these ellipsis techniques for reducing
recovery time.
We could easily increase the security by increasing the
NILFS_SEG_HISTORY_DEPTH, without reducing the performance. The
performance is mainly determined by how fast the device can read in the
segment summary blocks.

It just scans all the segment summary blocks of all segments and keeps a
history of the most promising candidates for recovery. After that the
candidates are processed further, including a full crc check and search
for partial segments with the NILFS_SS_SR flag if necessary.

Best regards,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-02-11 18:11:15 UTC
Permalink
Post by Andreas Rohner
Hi Ryusuke,
Post by Clemens Eisserer
Hi Andreas,
Post by Andreas Rohner
This patch introduces a mount option bad_ftl that disables the
periodic overwrites of the super block to make the file system better
suitable for bad flash memory with a bad FTL. The super block is only
written at umount time. So if there is a unclean shutdown the file
system needs to be recovered by a linear scan of all segment summary
blocks.
The linear scan is only necessary if the file system wasn't umounted
properly. So the normal mount time is not affected.
Do we really need to add the third crc in segument summary headers ?
After all, we need to do a full check for a log with a super root
block to validate it.
I need a way to quickly decide if a segment could be potentially valid
without reading in more blocks. The third crc is there, to make sure,
that the segment is not a valid segment of a previous instance of NILFS2
on the same volume. Such a previous instance would have used a different
crc seed. I only keep a limited number of history entries. This history
could be easily filled up with old segments from a previous instance and
the recovery would fail.
I tried to use the ss_sumsum crc for that purpose, but for that I have
to read in on average 5 to 8 extra blocks per segment. I cannot read
ahead these blocks, so the whole search is slowed down.
Sound reasonable. We still need to care for the field name and disk
format compatibility (including compat flags), but it sounds
inevitable for this approach.
Post by Andreas Rohner
Post by Clemens Eisserer
This patch also seems to be using the nature that headers which have a
NILFS_SS_SR flag sometimes appear at the head of segments. But this
is not guranteed. Is this condition eliminable?
It uses that fact, but it does not rely on it. If there is a recent
segment with NILFS_SS_SR flag at the top it will use that and leave the
rest to the normal recovery function. But if none is found, it will scan
all partial segments for the NILFS_SS_SR flag. This is done in
nilfs_search_partial_log_cursor.
But, the full segment scan by nilfs_search_partial_log_cursor() looks
to be performed only for segments whose sequence number is registered
in history[i].seq. If no registered semgents have a super root block,
what will happen?
Post by Andreas Rohner
Post by Clemens Eisserer
The measurement results are very interesting (thanks for the effort),
but they look to rely on a few these ellipsis techniques for reducing
recovery time.
We could easily increase the security by increasing the
NILFS_SEG_HISTORY_DEPTH, without reducing the performance. The
performance is mainly determined by how fast the device can read in the
segment summary blocks.
It just scans all the segment summary blocks of all segments and keeps a
history of the most promising candidates for recovery. After that the
candidates are processed further, including a full crc check and search
for partial segments with the NILFS_SS_SR flag if necessary.
Honestly, I'm still hesitative about the full scan approach since the
mount time depends on the device size and the medium type.

If we define some window size based on the performance of the device
(which would be measured and written in super block with mkfs or
nilfs-tune), and can limit the range of scan, things may become more
manageable.


Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-02-11 19:58:45 UTC
Permalink
Post by Ryusuke Konishi
Post by Andreas Rohner
Hi Ryusuke,
Post by Clemens Eisserer
Hi Andreas,
Post by Andreas Rohner
This patch introduces a mount option bad_ftl that disables the
periodic overwrites of the super block to make the file system better
suitable for bad flash memory with a bad FTL. The super block is only
written at umount time. So if there is a unclean shutdown the file
system needs to be recovered by a linear scan of all segment summary
blocks.
The linear scan is only necessary if the file system wasn't umounted
properly. So the normal mount time is not affected.
Do we really need to add the third crc in segument summary headers ?
After all, we need to do a full check for a log with a super root
block to validate it.
I need a way to quickly decide if a segment could be potentially valid
without reading in more blocks. The third crc is there, to make sure,
that the segment is not a valid segment of a previous instance of NILFS2
on the same volume. Such a previous instance would have used a different
crc seed. I only keep a limited number of history entries. This history
could be easily filled up with old segments from a previous instance and
the recovery would fail.
I tried to use the ss_sumsum crc for that purpose, but for that I have
to read in on average 5 to 8 extra blocks per segment. I cannot read
ahead these blocks, so the whole search is slowed down.
Sound reasonable. We still need to care for the field name and disk
format compatibility (including compat flags), but it sounds
inevitable for this approach.
Post by Andreas Rohner
Post by Clemens Eisserer
This patch also seems to be using the nature that headers which have a
NILFS_SS_SR flag sometimes appear at the head of segments. But this
is not guranteed. Is this condition eliminable?
It uses that fact, but it does not rely on it. If there is a recent
segment with NILFS_SS_SR flag at the top it will use that and leave the
rest to the normal recovery function. But if none is found, it will scan
all partial segments for the NILFS_SS_SR flag. This is done in
nilfs_search_partial_log_cursor.
But, the full segment scan by nilfs_search_partial_log_cursor() looks
to be performed only for segments whose sequence number is registered
in history[i].seq. If no registered semgents have a super root block,
what will happen?
It will try one of the older segments in history_sr. In that case, the
normal recovery function will have to do most of the work. But you are
right ultimately it could fail. If it fails it will fallback to the
values from the super block. I don't think it will be a problem in
practice, because in my tests, the super root was written very
frequently. Almost every second segment.

As far as I can tell, a super root is written for every checkpoint, and
there is a new checkpoint every 30 seconds. There is also the
NILFS_SB_FREQ, which is currently set to 10 seconds. So in fact a super
root is written every 10 seconds. We only have to set the size of the
history large enough, so that it is guaranteed to contain a super root.

Hmm but I agree, as it is now it could fail.
Post by Ryusuke Konishi
Post by Andreas Rohner
Post by Clemens Eisserer
The measurement results are very interesting (thanks for the effort),
but they look to rely on a few these ellipsis techniques for reducing
recovery time.
We could easily increase the security by increasing the
NILFS_SEG_HISTORY_DEPTH, without reducing the performance. The
performance is mainly determined by how fast the device can read in the
segment summary blocks.
It just scans all the segment summary blocks of all segments and keeps a
history of the most promising candidates for recovery. After that the
candidates are processed further, including a full crc check and search
for partial segments with the NILFS_SS_SR flag if necessary.
Honestly, I'm still hesitative about the full scan approach since the
mount time depends on the device size and the medium type.
I wouldn't recommend it as the default recovery option. The user has to
make a decision if it is right for his or her device and activate it.
But until now it is just a stupid experiment. It would only be useful in
certain corner cases anyway. Thanks for reviewing it!
Post by Ryusuke Konishi
If we define some window size based on the performance of the device
(which would be measured and written in super block with mkfs or
nilfs-tune), and can limit the range of scan, things may become more
manageable.
That would certainly be possible. The window would start at s_last_pseg
and end at (s_last_pseg + window size). We could then simply force a
super block write as soon as the first segment is allocated outside of
the window. This could still significantly reduce the number of writes
to the super block.

Thanks for your review,

Best regards,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-02-12 00:58:31 UTC
Permalink
Post by Andreas Rohner
Post by Ryusuke Konishi
Honestly, I'm still hesitative about the full scan approach since the
mount time depends on the device size and the medium type.
I wouldn't recommend it as the default recovery option. The user has to
make a decision if it is right for his or her device and activate it.
But until now it is just a stupid experiment. It would only be useful in
certain corner cases anyway. Thanks for reviewing it!
Post by Ryusuke Konishi
If we define some window size based on the performance of the device
(which would be measured and written in super block with mkfs or
nilfs-tune), and can limit the range of scan, things may become more
manageable.
That would certainly be possible. The window would start at s_last_pseg
and end at (s_last_pseg + window size). We could then simply force a
super block write as soon as the first segment is allocated outside of
the window. This could still significantly reduce the number of writes
to the super block.
Thanks for your review,
You're welcome, thank you, too.

By the way, we have another todo for flash devices. It is FITRIM
ioctl support. FITRIM is an API to issue TRIM/DISCARD requests
(through blkdev_issue_flash function) to a portion of underlying
device to allow batch DISCARD by userland tools. It helps GC
optimization of underlying flash device or thinprovisioning feature of
block storage. NILFS is suit for implementing this feature since free
space is managed in segment unit and sufile is available, but was long
time left.

If you have an interest, please take a look at it, too.

Thanks,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ryusuke Konishi
2014-02-12 01:23:40 UTC
Permalink
Post by Ryusuke Konishi
Post by Andreas Rohner
Post by Ryusuke Konishi
Honestly, I'm still hesitative about the full scan approach since the
mount time depends on the device size and the medium type.
I wouldn't recommend it as the default recovery option. The user has to
make a decision if it is right for his or her device and activate it.
But until now it is just a stupid experiment. It would only be useful in
certain corner cases anyway. Thanks for reviewing it!
Post by Ryusuke Konishi
If we define some window size based on the performance of the device
(which would be measured and written in super block with mkfs or
nilfs-tune), and can limit the range of scan, things may become more
manageable.
That would certainly be possible. The window would start at s_last_pseg
and end at (s_last_pseg + window size). We could then simply force a
super block write as soon as the first segment is allocated outside of
the window. This could still significantly reduce the number of writes
to the super block.
Thanks for your review,
You're welcome, thank you, too.
By the way, we have another todo for flash devices. It is FITRIM
ioctl support. FITRIM is an API to issue TRIM/DISCARD requests
(through blkdev_issue_flash function) to a portion of underlying
Oops, I made a mistake. it was blkdev_issue_discard().

Ryusuke Konishi
Post by Ryusuke Konishi
device to allow batch DISCARD by userland tools. It helps GC
optimization of underlying flash device or thinprovisioning feature of
block storage. NILFS is suit for implementing this feature since free
space is managed in segment unit and sufile is available, but was long
time left.
If you have an interest, please take a look at it, too.
Thanks,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Clemens Eisserer
2014-02-09 15:36:02 UTC
Permalink
Hi Andreas, hi Ryusuke,
Post by Andreas Rohner
Instead of periodically writing to the super block, this patch only
writes at mount and umount time and performs a linear scan for the
latest segment in case a recovery is necessary.
The SD-Cards and the USB-Stick are not particularly fast, but they are
small enough so that the recovery time is tolerable.
Finally I found some time to test your patch and also the new version
works fine (and fast!) here:

[ 3.349464] NILFS warning: searching for latest log
[ 4.747552] NILFS warning: mounting unchecked fs
[ 5.214883] NILFS: recovery complete.

So your enhanced recovery code requires ~1.3s for a 12GB nilfs2
partition on the higher-end 16GB SD card I use in the raspberry.
Also, despite frequent power-cuts I haven't obsereved any issues -
which made me switch to nilfs2+patch even for the rootfs (was ext4
ro).
Post by Andreas Rohner
I see. For further discussion on this approach, it looks like we need
some measurement data of the situation that this patch makes a
difference (for example, for an SD card or some device). Anyway, I
agree that the patch has a value for experiment purpose.
What do you think about the results obtained by andreas and me?
With SD cards (in the raspberry) I experience linear scan times as low
as ~110ms/1GB, and for everything else avoiding superblock writes
probably doesn't make sense anyway. And if some techie enables the
option on his SSD, recovery is also blazingly fast.

Thanks a lot & best regards, Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Andreas Rohner
2014-02-10 08:56:22 UTC
Permalink
Hi Clemens,
Post by Clemens Eisserer
Hi Andreas, hi Ryusuke,
Post by Andreas Rohner
Instead of periodically writing to the super block, this patch only
writes at mount and umount time and performs a linear scan for the
latest segment in case a recovery is necessary.
The SD-Cards and the USB-Stick are not particularly fast, but they are
small enough so that the recovery time is tolerable.
Finally I found some time to test your patch and also the new version
[ 3.349464] NILFS warning: searching for latest log
[ 4.747552] NILFS warning: mounting unchecked fs
[ 5.214883] NILFS: recovery complete.
So your enhanced recovery code requires ~1.3s for a 12GB nilfs2
partition on the higher-end 16GB SD card I use in the raspberry.
Also, despite frequent power-cuts I haven't obsereved any issues -
which made me switch to nilfs2+patch even for the rootfs (was ext4
ro).
Thanks for testing it on your raspberry pi. I also own one, but I didn't
move the root fs to nilfs2 yet. Please be careful using the patch on a
production system. Although I am quite confident that it is safe, there
may still be some horrible bug in my code. More testing is definitely
necessary. It is not an "enhanced recovery" as you put it, but more like
a corner case experimental brute force recovery. Nevertheless I am
surprised how fast it is on the pi.
Post by Clemens Eisserer
Post by Andreas Rohner
I see. For further discussion on this approach, it looks like we need
some measurement data of the situation that this patch makes a
difference (for example, for an SD card or some device). Anyway, I
agree that the patch has a value for experiment purpose.
What do you think about the results obtained by andreas and me?
With SD cards (in the raspberry) I experience linear scan times as low
as ~110ms/1GB, and for everything else avoiding superblock writes
probably doesn't make sense anyway. And if some techie enables the
option on his SSD, recovery is also blazingly fast.
Thanks a lot & best regards, Clemens
Best regards,
Andreas Rohner

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-***@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...