| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
arm64/fpsimd: ptrace: Fix SVE writes on !SME systems
When SVE is supported but SME is not supported, a ptrace write to the
NT_ARM_SVE regset can place the tracee into an invalid state where
(non-streaming) SVE register data is stored in FP_STATE_SVE format but
TIF_SVE is clear. This can result in a later warning from
fpsimd_restore_current_state(), e.g.
WARNING: CPU: 0 PID: 7214 at arch/arm64/kernel/fpsimd.c:383 fpsimd_restore_current_state+0x50c/0x748
When this happens, fpsimd_restore_current_state() will set TIF_SVE,
placing the task into the correct state. This occurs before any other
check of TIF_SVE can possibly occur, as other checks of TIF_SVE only
happen while the FPSIMD/SVE/SME state is live. Thus, aside from the
warning, there is no functional issue.
This bug was introduced during rework to error handling in commit:
9f8bf718f2923 ("arm64/fpsimd: ptrace: Gracefully handle errors")
... where the setting of TIF_SVE was moved into a block which is only
executed when system_supports_sme() is true.
Fix this by removing the system_supports_sme() check. This ensures that
TIF_SVE is set for (SVE-formatted) writes to NT_ARM_SVE, at the cost of
unconditionally manipulating the tracee's saved svcr value. The
manipulation of svcr is benign and inexpensive, and we already do
similar elsewhere (e.g. during signal handling), so I don't think it's
worth guarding this with system_supports_sme() checks.
Aside from the above, there is no functional change. The 'type' argument
to sve_set_common() is only set to ARM64_VEC_SME (in ssve_set())) when
system_supports_sme(), so the ARM64_VEC_SME case in the switch statement
is still unreachable when !system_supports_sme(). When
CONFIG_ARM64_SME=n, the only caller of sve_set_common() is sve_set(),
and the compiler can constant-fold for the case where type is
ARM64_VEC_SVE, removing the logic for other cases. |
| In the Linux kernel, the following vulnerability has been resolved:
serial: Fix not set tty->port race condition
Revert commit bfc467db60b7 ("serial: remove redundant
tty_port_link_device()") because the tty_port_link_device() is not
redundant: the tty->port has to be confured before we call
uart_configure_port(), otherwise user-space can open console without TTY
linked to the driver.
This tty_port_link_device() was added explicitly to avoid this exact
issue in commit fb2b90014d78 ("tty: link tty and port before configuring
it as console"), so offending commit basically reverted the fix saying
it is redundant without addressing the actual race condition presented
there.
Reproducible always as tty->port warning on Qualcomm SoC with most of
devices disabled, so with very fast boot, and one serial device being
the console:
printk: legacy console [ttyMSM0] enabled
printk: legacy console [ttyMSM0] enabled
printk: legacy bootconsole [qcom_geni0] disabled
printk: legacy bootconsole [qcom_geni0] disabled
------------[ cut here ]------------
tty_init_dev: ttyMSM driver does not set tty->port. This would crash the kernel. Fix the driver!
WARNING: drivers/tty/tty_io.c:1414 at tty_init_dev.part.0+0x228/0x25c, CPU#2: systemd/1
Modules linked in: socinfo tcsrcc_eliza gcc_eliza sm3_ce fuse ipv6
CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G S 6.19.0-rc4-next-20260108-00024-g2202f4d30aa8 #73 PREEMPT
Tainted: [S]=CPU_OUT_OF_SPEC
Hardware name: Qualcomm Technologies, Inc. Eliza (DT)
...
tty_init_dev.part.0 (drivers/tty/tty_io.c:1414 (discriminator 11)) (P)
tty_open (arch/arm64/include/asm/atomic_ll_sc.h:95 (discriminator 3) drivers/tty/tty_io.c:2073 (discriminator 3) drivers/tty/tty_io.c:2120 (discriminator 3))
chrdev_open (fs/char_dev.c:411)
do_dentry_open (fs/open.c:962)
vfs_open (fs/open.c:1094)
do_open (fs/namei.c:4634)
path_openat (fs/namei.c:4793)
do_filp_open (fs/namei.c:4820)
do_sys_openat2 (fs/open.c:1391 (discriminator 3))
...
Starting Network Name Resolution...
Apparently the flow with this small Yocto-based ramdisk user-space is:
driver (qcom_geni_serial.c): user-space:
============================ ===========
qcom_geni_serial_probe()
uart_add_one_port()
serial_core_register_port()
serial_core_add_one_port()
uart_configure_port()
register_console()
|
| open console
| ...
| tty_init_dev()
| driver->ports[idx] is NULL
|
tty_port_register_device_attr_serdev()
tty_port_link_device() <- set driver->ports[idx] |
| In the Linux kernel, the following vulnerability has been resolved:
pmdomain: imx8m-blk-ctrl: Remove separate rst and clk mask for 8mq vpu
For i.MX8MQ platform, the ADB in the VPUMIX domain has no separate reset
and clock enable bits, but is ungated and reset together with the VPUs.
So we can't reset G1 or G2 separately, it may led to the system hang.
Remove rst_mask and clk_mask of imx8mq_vpu_blk_ctl_domain_data.
Let imx8mq_vpu_power_notifier() do really vpu reset. |
| In the Linux kernel, the following vulnerability has been resolved:
ice: add missing ice_deinit_hw() in devlink reinit path
devlink-reload results in ice_init_hw failed error, and then removing
the ice driver causes a NULL pointer dereference.
[ +0.102213] ice 0000:ca:00.0: ice_init_hw failed: -16
...
[ +0.000001] Call Trace:
[ +0.000003] <TASK>
[ +0.000006] ice_unload+0x8f/0x100 [ice]
[ +0.000081] ice_remove+0xba/0x300 [ice]
Commit 1390b8b3d2be ("ice: remove duplicate call to ice_deinit_hw() on
error paths") removed ice_deinit_hw() from ice_deinit_dev(). As a result
ice_devlink_reinit_down() no longer calls ice_deinit_hw(), but
ice_devlink_reinit_up() still calls ice_init_hw(). Since the control
queues are not uninitialized, ice_init_hw() fails with -EBUSY.
Add ice_deinit_hw() to ice_devlink_reinit_down() to correspond with
ice_init_hw() in ice_devlink_reinit_up(). |
| In the Linux kernel, the following vulnerability has been resolved:
fou: Don't allow 0 for FOU_ATTR_IPPROTO.
fou_udp_recv() has the same problem mentioned in the previous
patch.
If FOU_ATTR_IPPROTO is set to 0, skb is not freed by
fou_udp_recv() nor "resubmit"-ted in ip_protocol_deliver_rcu().
Let's forbid 0 for FOU_ATTR_IPPROTO. |
| In the Linux kernel, the following vulnerability has been resolved:
rxrpc: Fix data-race warning and potential load/store tearing
Fix the following:
BUG: KCSAN: data-race in rxrpc_peer_keepalive_worker / rxrpc_send_data_packet
which is reporting an issue with the reads and writes to ->last_tx_at in:
conn->peer->last_tx_at = ktime_get_seconds();
and:
keepalive_at = peer->last_tx_at + RXRPC_KEEPALIVE_TIME;
The lockless accesses to these to values aren't actually a problem as the
read only needs an approximate time of last transmission for the purposes
of deciding whether or not the transmission of a keepalive packet is
warranted yet.
Also, as ->last_tx_at is a 64-bit value, tearing can occur on a 32-bit
arch.
Fix both of these by switching to an unsigned int for ->last_tx_at and only
storing the LSW of the time64_t. It can then be reconstructed at need
provided no more than 68 years has elapsed since the last transmission. |
| In the Linux kernel, the following vulnerability has been resolved:
bonding: provide a net pointer to __skb_flow_dissect()
After 3cbf4ffba5ee ("net: plumb network namespace into __skb_flow_dissect")
we have to provide a net pointer to __skb_flow_dissect(),
either via skb->dev, skb->sk, or a user provided pointer.
In the following case, syzbot was able to cook a bare skb.
WARNING: net/core/flow_dissector.c:1131 at __skb_flow_dissect+0xb57/0x68b0 net/core/flow_dissector.c:1131, CPU#1: syz.2.1418/11053
Call Trace:
<TASK>
bond_flow_dissect drivers/net/bonding/bond_main.c:4093 [inline]
__bond_xmit_hash+0x2d7/0xba0 drivers/net/bonding/bond_main.c:4157
bond_xmit_hash_xdp drivers/net/bonding/bond_main.c:4208 [inline]
bond_xdp_xmit_3ad_xor_slave_get drivers/net/bonding/bond_main.c:5139 [inline]
bond_xdp_get_xmit_slave+0x1fd/0x710 drivers/net/bonding/bond_main.c:5515
xdp_master_redirect+0x13f/0x2c0 net/core/filter.c:4388
bpf_prog_run_xdp include/net/xdp.h:700 [inline]
bpf_test_run+0x6b2/0x7d0 net/bpf/test_run.c:421
bpf_prog_test_run_xdp+0x795/0x10e0 net/bpf/test_run.c:1390
bpf_prog_test_run+0x2c7/0x340 kernel/bpf/syscall.c:4703
__sys_bpf+0x562/0x860 kernel/bpf/syscall.c:6182
__do_sys_bpf kernel/bpf/syscall.c:6274 [inline]
__se_sys_bpf kernel/bpf/syscall.c:6272 [inline]
__x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:6272
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94 |
| In the Linux kernel, the following vulnerability has been resolved:
l2tp: avoid one data-race in l2tp_tunnel_del_work()
We should read sk->sk_socket only when dealing with kernel sockets.
syzbot reported the following data-race:
BUG: KCSAN: data-race in l2tp_tunnel_del_work / sk_common_release
write to 0xffff88811c182b20 of 8 bytes by task 5365 on cpu 0:
sk_set_socket include/net/sock.h:2092 [inline]
sock_orphan include/net/sock.h:2118 [inline]
sk_common_release+0xae/0x230 net/core/sock.c:4003
udp_lib_close+0x15/0x20 include/net/udp.h:325
inet_release+0xce/0xf0 net/ipv4/af_inet.c:437
__sock_release net/socket.c:662 [inline]
sock_close+0x6b/0x150 net/socket.c:1455
__fput+0x29b/0x650 fs/file_table.c:468
____fput+0x1c/0x30 fs/file_table.c:496
task_work_run+0x131/0x1a0 kernel/task_work.c:233
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
__exit_to_user_mode_loop kernel/entry/common.c:44 [inline]
exit_to_user_mode_loop+0x1fe/0x740 kernel/entry/common.c:75
__exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:159 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:194 [inline]
do_syscall_64+0x1e1/0x2b0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffff88811c182b20 of 8 bytes by task 827 on cpu 1:
l2tp_tunnel_del_work+0x2f/0x1a0 net/l2tp/l2tp_core.c:1418
process_one_work kernel/workqueue.c:3257 [inline]
process_scheduled_works+0x4ce/0x9d0 kernel/workqueue.c:3340
worker_thread+0x582/0x770 kernel/workqueue.c:3421
kthread+0x489/0x510 kernel/kthread.c:463
ret_from_fork+0x149/0x290 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
value changed: 0xffff88811b818000 -> 0x0000000000000000 |
| In the Linux kernel, the following vulnerability has been resolved:
mISDN: annotate data-race around dev->work
dev->work can re read locklessly in mISDN_read()
and mISDN_poll(). Add READ_ONCE()/WRITE_ONCE() annotations.
BUG: KCSAN: data-race in mISDN_ioctl / mISDN_read
write to 0xffff88812d848280 of 4 bytes by task 10864 on cpu 1:
misdn_add_timer drivers/isdn/mISDN/timerdev.c:175 [inline]
mISDN_ioctl+0x2fb/0x550 drivers/isdn/mISDN/timerdev.c:233
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl+0xce/0x140 fs/ioctl.c:583
__x64_sys_ioctl+0x43/0x50 fs/ioctl.c:583
x64_sys_call+0x14b0/0x3000 arch/x86/include/generated/asm/syscalls_64.h:17
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xd8/0x2c0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffff88812d848280 of 4 bytes by task 10857 on cpu 0:
mISDN_read+0x1f2/0x470 drivers/isdn/mISDN/timerdev.c:112
do_loop_readv_writev fs/read_write.c:847 [inline]
vfs_readv+0x3fb/0x690 fs/read_write.c:1020
do_readv+0xe7/0x210 fs/read_write.c:1080
__do_sys_readv fs/read_write.c:1165 [inline]
__se_sys_readv fs/read_write.c:1162 [inline]
__x64_sys_readv+0x45/0x50 fs/read_write.c:1162
x64_sys_call+0x2831/0x3000 arch/x86/include/generated/asm/syscalls_64.h:20
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xd8/0x2c0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0x00000000 -> 0x00000001 |
| In the Linux kernel, the following vulnerability has been resolved:
gue: Fix skb memleak with inner IP protocol 0.
syzbot reported skb memleak below. [0]
The repro generated a GUE packet with its inner protocol 0.
gue_udp_recv() returns -guehdr->proto_ctype for "resubmit"
in ip_protocol_deliver_rcu(), but this only works with
non-zero protocol number.
Let's drop such packets.
Note that 0 is a valid number (IPv6 Hop-by-Hop Option).
I think it is not practical to encap HOPOPT in GUE, so once
someone starts to complain, we could pass down a resubmit
flag pointer to distinguish two zeros from the upper layer:
* no error
* resubmit HOPOPT
[0]
BUG: memory leak
unreferenced object 0xffff888109695a00 (size 240):
comm "syz.0.17", pid 6088, jiffies 4294943096
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 40 c2 10 81 88 ff ff 00 00 00 00 00 00 00 00 .@..............
backtrace (crc a84b336f):
kmemleak_alloc_recursive include/linux/kmemleak.h:44 [inline]
slab_post_alloc_hook mm/slub.c:4958 [inline]
slab_alloc_node mm/slub.c:5263 [inline]
kmem_cache_alloc_noprof+0x3b4/0x590 mm/slub.c:5270
__build_skb+0x23/0x60 net/core/skbuff.c:474
build_skb+0x20/0x190 net/core/skbuff.c:490
__tun_build_skb drivers/net/tun.c:1541 [inline]
tun_build_skb+0x4a1/0xa40 drivers/net/tun.c:1636
tun_get_user+0xc12/0x2030 drivers/net/tun.c:1770
tun_chr_write_iter+0x71/0x120 drivers/net/tun.c:1999
new_sync_write fs/read_write.c:593 [inline]
vfs_write+0x45d/0x710 fs/read_write.c:686
ksys_write+0xa7/0x170 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xa4/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f |
| In the Linux kernel, the following vulnerability has been resolved:
uacce: fix cdev handling in the cleanup path
When cdev_device_add fails, it internally releases the cdev memory,
and if cdev_device_del is then executed, it will cause a hang error.
To fix it, we check the return value of cdev_device_add() and clear
uacce->cdev to avoid calling cdev_device_del in the uacce_remove. |
| In the Linux kernel, the following vulnerability has been resolved:
smb: server: fix leak of active_num_conn in ksmbd_tcp_new_connection()
On kthread_run() failure in ksmbd_tcp_new_connection(), the transport is
freed via free_transport(), which does not decrement active_num_conn,
leaking this counter.
Replace free_transport() with ksmbd_tcp_disconnect(). |
| In the Linux kernel, the following vulnerability has been resolved:
crypto: virtio - Add spinlock protection with virtqueue notification
When VM boots with one virtio-crypto PCI device and builtin backend,
run openssl benchmark command with multiple processes, such as
openssl speed -evp aes-128-cbc -engine afalg -seconds 10 -multi 32
openssl processes will hangup and there is error reported like this:
virtio_crypto virtio0: dataq.0:id 3 is not a head!
It seems that the data virtqueue need protection when it is handled
for virtio done notification. If the spinlock protection is added
in virtcrypto_done_task(), openssl benchmark with multiple processes
works well. |
| In the Linux kernel, the following vulnerability has been resolved:
migrate: correct lock ordering for hugetlb file folios
Syzbot has found a deadlock (analyzed by Lance Yang):
1) Task (5749): Holds folio_lock, then tries to acquire i_mmap_rwsem(read lock).
2) Task (5754): Holds i_mmap_rwsem(write lock), then tries to acquire
folio_lock.
migrate_pages()
-> migrate_hugetlbs()
-> unmap_and_move_huge_page() <- Takes folio_lock!
-> remove_migration_ptes()
-> __rmap_walk_file()
-> i_mmap_lock_read() <- Waits for i_mmap_rwsem(read lock)!
hugetlbfs_fallocate()
-> hugetlbfs_punch_hole() <- Takes i_mmap_rwsem(write lock)!
-> hugetlbfs_zero_partial_page()
-> filemap_lock_hugetlb_folio()
-> filemap_lock_folio()
-> __filemap_get_folio <- Waits for folio_lock!
The migration path is the one taking locks in the wrong order according to
the documentation at the top of mm/rmap.c. So expand the scope of the
existing i_mmap_lock to cover the calls to remove_migration_ptes() too.
This is (mostly) how it used to be after commit c0d0381ade79. That was
removed by 336bf30eb765 for both file & anon hugetlb pages when it should
only have been removed for anon hugetlb pages. |
| In the Linux kernel, the following vulnerability has been resolved:
audit: add missing syscalls to read class
The "at" variant of getxattr() and listxattr() are missing from the
audit read class. Calling getxattrat() or listxattrat() on a file to
read its extended attributes will bypass audit rules such as:
-w /tmp/test -p rwa -k test_rwa
The current patch adds missing syscalls to the audit read class. |
| In the Linux kernel, the following vulnerability has been resolved:
smb: client: split cached_fid bitfields to avoid shared-byte RMW races
is_open, has_lease and on_list are stored in the same bitfield byte in
struct cached_fid but are updated in different code paths that may run
concurrently. Bitfield assignments generate byte read–modify–write
operations (e.g. `orb $mask, addr` on x86_64), so updating one flag can
restore stale values of the others.
A possible interleaving is:
CPU1: load old byte (has_lease=1, on_list=1)
CPU2: clear both flags (store 0)
CPU1: RMW store (old | IS_OPEN) -> reintroduces cleared bits
To avoid this class of races, convert these flags to separate bool
fields. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: fix use-after-free in nf_tables_addchain()
nf_tables_addchain() publishes the chain to table->chains via
list_add_tail_rcu() (in nft_chain_add()) before registering hooks.
If nf_tables_register_hook() then fails, the error path calls
nft_chain_del() (list_del_rcu()) followed by nf_tables_chain_destroy()
with no RCU grace period in between.
This creates two use-after-free conditions:
1) Control-plane: nf_tables_dump_chains() traverses table->chains
under rcu_read_lock(). A concurrent dump can still be walking
the chain when the error path frees it.
2) Packet path: for NFPROTO_INET, nf_register_net_hook() briefly
installs the IPv4 hook before IPv6 registration fails. Packets
entering nft_do_chain() via the transient IPv4 hook can still be
dereferencing chain->blob_gen_X when the error path frees the
chain.
Add synchronize_rcu() between nft_chain_del() and the chain destroy
so that all RCU readers -- both dump threads and in-flight packet
evaluation -- have finished before the chain is freed. |
| In the Linux kernel, the following vulnerability has been resolved:
scsi: qla2xxx: Fix bsg_done() causing double free
Kernel panic observed on system,
[5353358.825191] BUG: unable to handle page fault for address: ff5f5e897b024000
[5353358.825194] #PF: supervisor write access in kernel mode
[5353358.825195] #PF: error_code(0x0002) - not-present page
[5353358.825196] PGD 100006067 P4D 0
[5353358.825198] Oops: 0002 [#1] PREEMPT SMP NOPTI
[5353358.825200] CPU: 5 PID: 2132085 Comm: qlafwupdate.sub Kdump: loaded Tainted: G W L ------- --- 5.14.0-503.34.1.el9_5.x86_64 #1
[5353358.825203] Hardware name: HPE ProLiant DL360 Gen11/ProLiant DL360 Gen11, BIOS 2.44 01/17/2025
[5353358.825204] RIP: 0010:memcpy_erms+0x6/0x10
[5353358.825211] RSP: 0018:ff591da8f4f6b710 EFLAGS: 00010246
[5353358.825212] RAX: ff5f5e897b024000 RBX: 0000000000007090 RCX: 0000000000001000
[5353358.825213] RDX: 0000000000001000 RSI: ff591da8f4fed090 RDI: ff5f5e897b024000
[5353358.825214] RBP: 0000000000010000 R08: ff5f5e897b024000 R09: 0000000000000000
[5353358.825215] R10: ff46cf8c40517000 R11: 0000000000000001 R12: 0000000000008090
[5353358.825216] R13: ff591da8f4f6b720 R14: 0000000000001000 R15: 0000000000000000
[5353358.825218] FS: 00007f1e88d47740(0000) GS:ff46cf935f940000(0000) knlGS:0000000000000000
[5353358.825219] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[5353358.825220] CR2: ff5f5e897b024000 CR3: 0000000231532004 CR4: 0000000000771ef0
[5353358.825221] PKRU: 55555554
[5353358.825222] Call Trace:
[5353358.825223] <TASK>
[5353358.825224] ? show_trace_log_lvl+0x1c4/0x2df
[5353358.825229] ? show_trace_log_lvl+0x1c4/0x2df
[5353358.825232] ? sg_copy_buffer+0xc8/0x110
[5353358.825236] ? __die_body.cold+0x8/0xd
[5353358.825238] ? page_fault_oops+0x134/0x170
[5353358.825242] ? kernelmode_fixup_or_oops+0x84/0x110
[5353358.825244] ? exc_page_fault+0xa8/0x150
[5353358.825247] ? asm_exc_page_fault+0x22/0x30
[5353358.825252] ? memcpy_erms+0x6/0x10
[5353358.825253] sg_copy_buffer+0xc8/0x110
[5353358.825259] qla2x00_process_vendor_specific+0x652/0x1320 [qla2xxx]
[5353358.825317] qla24xx_bsg_request+0x1b2/0x2d0 [qla2xxx]
Most routines in qla_bsg.c call bsg_done() only for success cases.
However a few invoke it for failure case as well leading to a double
free. Validate before calling bsg_done(). |
| In the Linux kernel, the following vulnerability has been resolved:
Revert "f2fs: block cache/dio write during f2fs_enable_checkpoint()"
This reverts commit 196c81fdd438f7ac429d5639090a9816abb9760a.
Original patch may cause below deadlock, revert it.
write remount
- write_begin
- lock_page --- lock A
- prepare_write_begin
- f2fs_map_lock
- f2fs_enable_checkpoint
- down_write(cp_enable_rwsem) --- lock B
- sync_inode_sb
- writepages
- lock_page --- lock A
- down_read(cp_enable_rwsem) --- lock A |
| In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to avoid mapping wrong physical block for swapfile
Xiaolong Guo reported a f2fs bug in bugzilla [1]
[1] https://bugzilla.kernel.org/show_bug.cgi?id=220951
Quoted:
"When using stress-ng's swap stress test on F2FS filesystem with kernel 6.6+,
the system experiences data corruption leading to either:
1 dm-verity corruption errors and device reboot
2 F2FS node corruption errors and boot hangs
The issue occurs specifically when:
1 Using F2FS filesystem (ext4 is unaffected)
2 Swapfile size is less than F2FS section size (2MB)
3 Swapfile has fragmented physical layout (multiple non-contiguous extents)
4 Kernel version is 6.6+ (6.1 is unaffected)
The root cause is in check_swap_activate() function in fs/f2fs/data.c. When the
first extent of a small swapfile (< 2MB) is not aligned to section boundaries,
the function incorrectly treats it as the last extent, failing to map
subsequent extents. This results in incorrect swap_extent creation where only
the first extent is mapped, causing subsequent swap writes to overwrite wrong
physical locations (other files' data).
Steps to Reproduce
1 Setup a device with F2FS-formatted userdata partition
2 Compile stress-ng from https://github.com/ColinIanKing/stress-ng
3 Run swap stress test: (Android devices)
adb shell "cd /data/stressng; ./stress-ng-64 --metrics-brief --timeout 60
--swap 0"
Log:
1 Ftrace shows in kernel 6.6, only first extent is mapped during second
f2fs_map_blocks call in check_swap_activate():
stress-ng-swap-8990: f2fs_map_blocks: ino=11002, file offset=0, start
blkaddr=0x43143, len=0x1
(Only 4KB mapped, not the full swapfile)
2 in kernel 6.1, both extents are correctly mapped:
stress-ng-swap-5966: f2fs_map_blocks: ino=28011, file offset=0, start
blkaddr=0x13cd4, len=0x1
stress-ng-swap-5966: f2fs_map_blocks: ino=28011, file offset=1, start
blkaddr=0x60c84b, len=0xff
The problematic code is in check_swap_activate():
if ((pblock - SM_I(sbi)->main_blkaddr) % blks_per_sec ||
nr_pblocks % blks_per_sec ||
!f2fs_valid_pinned_area(sbi, pblock)) {
bool last_extent = false;
not_aligned++;
nr_pblocks = roundup(nr_pblocks, blks_per_sec);
if (cur_lblock + nr_pblocks > sis->max)
nr_pblocks -= blks_per_sec;
/* this extent is last one */
if (!nr_pblocks) {
nr_pblocks = last_lblock - cur_lblock;
last_extent = true;
}
ret = f2fs_migrate_blocks(inode, cur_lblock, nr_pblocks);
if (ret) {
if (ret == -ENOENT)
ret = -EINVAL;
goto out;
}
if (!last_extent)
goto retry;
}
When the first extent is unaligned and roundup(nr_pblocks, blks_per_sec)
exceeds sis->max, we subtract blks_per_sec resulting in nr_pblocks = 0. The
code then incorrectly assumes this is the last extent, sets nr_pblocks =
last_lblock - cur_lblock (entire swapfile), and performs migration. After
migration, it doesn't retry mapping, so subsequent extents are never processed.
"
In order to fix this issue, we need to lookup block mapping info after
we migrate all blocks in the tail of swapfile. |