|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6 |
|
| #
72d1e611 |
| 13-Sep-2022 |
Jiebin Sun <[email protected]> |
ipc/msg: mitigate the lock contention with percpu counter
The msg_bytes and msg_hdrs atomic counters are frequently updated when IPC msg queue is in heavy use, causing heavy cache bounce and overhea
ipc/msg: mitigate the lock contention with percpu counter
The msg_bytes and msg_hdrs atomic counters are frequently updated when IPC msg queue is in heavy use, causing heavy cache bounce and overhead. Change them to percpu_counter greatly improve the performance. Since there is one percpu struct per namespace, additional memory cost is minimal. Reading of the count done in msgctl call, which is infrequent. So the need to sum up the counts in each CPU is infrequent.
Apply the patch and test the pts/stress-ng-1.4.0 -- system v message passing (160 threads).
Score gain: 3.99x
CPU: ICX 8380 x 2 sockets Core number: 40 x 2 physical cores Benchmark: pts/stress-ng-1.4.0 -- system v message passing (160 threads)
[[email protected]: coding-style cleanups] [[email protected]: avoid negative value by overflow in msginfo] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix min() warnings] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiebin Sun <[email protected]> Reviewed-by: Tim Chen <[email protected]> Cc: Alexander Mikhalitsyn <[email protected]> Cc: Alexey Gladkov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vasily Averin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5 |
|
| #
1f5c135e |
| 14-Feb-2022 |
Alexey Gladkov <[email protected]> |
ipc: Store ipc sysctls in the ipc namespace
The ipc sysctls are not available for modification inside the user namespace. Following the mqueue sysctls, we changed the implementation to be more usern
ipc: Store ipc sysctls in the ipc namespace
The ipc sysctls are not available for modification inside the user namespace. Following the mqueue sysctls, we changed the implementation to be more userns friendly.
So far, the changes do not provide additional access to files. This will be done in a future patch.
Signed-off-by: Alexey Gladkov <[email protected]> Link: https://lkml.kernel.org/r/be6f9d014276f4dddd0c3aa05a86052856c1c555.1644862280.git.legion@kernel.org Signed-off-by: Eric W. Biederman <[email protected]>
show more ...
|
| #
dc55e35f |
| 14-Feb-2022 |
Alexey Gladkov <[email protected]> |
ipc: Store mqueue sysctls in the ipc namespace
Right now, the mqueue sysctls take ipc namespaces into account in a rather hacky way. This works in most cases, but does not respect the user namespace
ipc: Store mqueue sysctls in the ipc namespace
Right now, the mqueue sysctls take ipc namespaces into account in a rather hacky way. This works in most cases, but does not respect the user namespace.
Within the user namespace, the user cannot change the /proc/sys/fs/mqueue/* parametres. This poses a problem in the rootless containers.
To solve this I changed the implementation of the mqueue sysctls just like some other sysctls.
So far, the changes do not provide additional access to files. This will be done in a future patch.
v3: * Don't implemenet set_permissions to keep the current behavior.
v2: * Fixed compilation problem if CONFIG_POSIX_MQUEUE_SYSCTL is not specified.
Reported-by: kernel test robot <[email protected]> Signed-off-by: Alexey Gladkov <[email protected]> Link: https://lkml.kernel.org/r/b0ccbb2489119f1f20c737cf1930c3a9c4e4243a.1644862280.git.legion@kernel.org Signed-off-by: Eric W. Biederman <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2 |
|
| #
85b6d246 |
| 20-Nov-2021 |
Alexander Mikhalitsyn <[email protected]> |
shm: extend forced shm destroy to support objects from several IPC nses
Currently, the exit_shm() function not designed to work properly when task->sysvshm.shm_clist holds shm objects from different
shm: extend forced shm destroy to support objects from several IPC nses
Currently, the exit_shm() function not designed to work properly when task->sysvshm.shm_clist holds shm objects from different IPC namespaces.
This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it leads to use-after-free (reproducer exists).
This is an attempt to fix the problem by extending exit_shm mechanism to handle shm's destroy from several IPC ns'es.
To achieve that we do several things:
1. add a namespace (non-refcounted) pointer to the struct shmid_kernel
2. during new shm object creation (newseg()/shmget syscall) we initialize this pointer by current task IPC ns
3. exit_shm() fully reworked such that it traverses over all shp's in task->sysvshm.shm_clist and gets IPC namespace not from current task as it was before but from shp's object itself, then call shm_destroy(shp, ns).
Note: We need to be really careful here, because as it was said before (1), our pointer to IPC ns non-refcnt'ed. To be on the safe side we using special helper get_ipc_ns_not_zero() which allows to get IPC ns refcounter only if IPC ns not in the "state of destruction".
Q/A
Q: Why can we access shp->ns memory using non-refcounted pointer? A: Because shp object lifetime is always shorther than IPC namespace lifetime, so, if we get shp object from the task->sysvshm.shm_clist while holding task_lock(task) nobody can steal our namespace.
Q: Does this patch change semantics of unshare/setns/clone syscalls? A: No. It's just fixes non-covered case when process may leave IPC namespace without getting task->sysvshm.shm_clist list cleaned up.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity") Co-developed-by: Manfred Spraul <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Signed-off-by: Alexander Mikhalitsyn <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Greg KH <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Pavel Tikhomirov <[email protected]> Cc: Vasily Averin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1 |
|
| #
137ec390 |
| 03-Aug-2020 |
Kirill Tkhai <[email protected]> |
ipc: Use generic ns_common::count
Switch over ipc namespaces to use the newly introduced common lifetime counter.
Currently every namespace type has its own lifetime counter which is stored in the
ipc: Use generic ns_common::count
Switch over ipc namespaces to use the newly introduced common lifetime counter.
Currently every namespace type has its own lifetime counter which is stored in the specific namespace struct. The lifetime counters are used identically for all namespaces types. Namespaces may of course have additional unrelated counters and these are not altered.
This introduces a common lifetime counter into struct ns_common. The ns_common struct encompasses information that all namespaces share. That should include the lifetime counter since its common for all of them.
It also allows us to unify the type of the counters across all namespaces. Most of them use refcount_t but one uses atomic_t and at least one uses kref. Especially the last one doesn't make much sense since it's just a wrapper around refcount_t since 2016 and actually complicates cleanup operations by having to use container_of() to cast the correct namespace struct out of struct ns_common.
Having the lifetime counter for the namespaces in one place reduces maintenance cost. Not just because after switching all namespaces over we will have removed more code than we added but also because the logic is more easily understandable and we indicate to the user that the basic lifetime requirements for all namespaces are currently identical.
Signed-off-by: Kirill Tkhai <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/159644978697.604812.16592754423881032385.stgit@localhost.localdomain Signed-off-by: Christian Brauner <[email protected]>
show more ...
|
|
Revision tags: v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1 |
|
| #
e1eb26fa |
| 08-Jun-2020 |
Giuseppe Scrivano <[email protected]> |
ipc/namespace.c: use a work queue to free_ipc
the reason is to avoid a delay caused by the synchronize_rcu() call in kern_umount() when the mqueue mount is freed.
the code:
#define _GNU_SOURCE
ipc/namespace.c: use a work queue to free_ipc
the reason is to avoid a delay caused by the synchronize_rcu() call in kern_umount() when the mqueue mount is freed.
the code:
#define _GNU_SOURCE #include <sched.h> #include <error.h> #include <errno.h> #include <stdlib.h>
int main() { int i;
for (i = 0; i < 1000; i++) if (unshare(CLONE_NEWIPC) < 0) error(EXIT_FAILURE, errno, "unshare"); }
goes from
Command being timed: "./ipc-namespace" User time (seconds): 0.00 System time (seconds): 0.06 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.05
to
Command being timed: "./ipc-namespace" User time (seconds): 0.00 System time (seconds): 0.02 Percent of CPU this job got: 96% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.03
Signed-off-by: Giuseppe Scrivano <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Reviewed-by: Waiman Long <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1 |
|
| #
3278a2c2 |
| 14-May-2019 |
Manfred Spraul <[email protected]> |
ipc: conserve sequence numbers in ipcmni_extend mode
Rewrite, based on the patch from Waiman Long:
The mixing in of a sequence number into the IPC IDs is probably to avoid ID reuse in userspace as
ipc: conserve sequence numbers in ipcmni_extend mode
Rewrite, based on the patch from Waiman Long:
The mixing in of a sequence number into the IPC IDs is probably to avoid ID reuse in userspace as much as possible. With ipcmni_extend mode, the number of usable sequence numbers is greatly reduced leading to higher chance of ID reuse.
To address this issue, we need to conserve the sequence number space as much as possible. Right now, the sequence number is incremented for every new ID created. In reality, we only need to increment the sequence number when new allocated ID is not greater than the last one allocated. It is in such case that the new ID may collide with an existing one. This is being done irrespective of the ipcmni mode.
In order to avoid any races, the index is first allocated and then the pointer is replaced.
Changes compared to the initial patch: - Handle failures from idr_alloc(). - Avoid that concurrent operations can see the wrong sequence number. (This is achieved by using idr_replace()). - IPCMNI_SEQ_SHIFT is not a constant, thus renamed to ipcmni_seq_shift(). - IPCMNI_SEQ_MAX is not a constant, thus renamed to ipcmni_seq_max().
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Manfred Spraul <[email protected]> Signed-off-by: Waiman Long <[email protected]> Suggested-by: Matthew Wilcox <[email protected]> Acked-by: Waiman Long <[email protected]> Cc: Al Viro <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kees Cook <[email protected]> Cc: "Luis R. Rodriguez" <[email protected]> Cc: Takashi Iwai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1 |
|
| #
27c331a1 |
| 22-Aug-2018 |
Manfred Spraul <[email protected]> |
ipc/util.c: further variable name cleanups
The varable names got a mess, thus standardize them again:
id: user space id. Called semid, shmid, msgid if the type is known. Most functions use "id"
ipc/util.c: further variable name cleanups
The varable names got a mess, thus standardize them again:
id: user space id. Called semid, shmid, msgid if the type is known. Most functions use "id" already. idx: "index" for the idr lookup Right now, some functions use lid, ipc_addid() already uses idx as the variable name. seq: sequence number, to avoid quick collisions of the user space id key: user space key, used for the rhash tree
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Manfred Spraul <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Kees Cook <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
dc2c8c84 |
| 22-Aug-2018 |
Davidlohr Bueso <[email protected]> |
ipc: get rid of ids->tables_initialized hack
In sysvipc we have an ids->tables_initialized regarding the rhashtable, introduced in 0cfb6aee70bd ("ipc: optimize semget/shmget/msgget for lots of keys"
ipc: get rid of ids->tables_initialized hack
In sysvipc we have an ids->tables_initialized regarding the rhashtable, introduced in 0cfb6aee70bd ("ipc: optimize semget/shmget/msgget for lots of keys")
It's there, specifically, to prevent nil pointer dereferences, from using an uninitialized api. Considering how rhashtable_init() can fail (probably due to ENOMEM, if anything), this made the overall ipc initialization capable of failure as well. That alone is ugly, but fine, however I've spotted a few issues regarding the semantics of tables_initialized (however unlikely they may be):
- There is inconsistency in what we return to userspace: ipc_addid() returns ENOSPC which is certainly _wrong_, while ipc_obtain_object_idr() returns EINVAL.
- After we started using rhashtables, ipc_findkey() can return nil upon !tables_initialized, but the caller expects nil for when the ipc structure isn't found, and can therefore call into ipcget() callbacks.
Now that rhashtable initialization cannot fail, we can properly get rid of the hack altogether.
[[email protected]: commit id extended to 12 digits] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Kees Cook <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2 |
|
| #
0eb71a9d |
| 18-Jun-2018 |
NeilBrown <[email protected]> |
rhashtable: split rhashtable.h
Due to the use of rhashtables in net namespaces, rhashtable.h is included in lots of the kernel, so a small changes can required a large recompilation. This makes deve
rhashtable: split rhashtable.h
Due to the use of rhashtables in net namespaces, rhashtable.h is included in lots of the kernel, so a small changes can required a large recompilation. This makes development painful.
This patch splits out rhashtable-types.h which just includes the major type declarations, and does not include (non-trivial) inline code. rhashtable.h is no longer included by anything in the include/ directory. Common include files only include rhashtable-types.h so a large recompilation is only triggered when that changes.
Acked-by: Herbert Xu <[email protected]> Signed-off-by: NeilBrown <[email protected]> Signed-off-by: David S. Miller <[email protected]>
show more ...
|
|
Revision tags: v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5, v4.16-rc4, v4.16-rc3, v4.16-rc2, v4.16-rc1, v4.15, v4.15-rc9, v4.15-rc8, v4.15-rc7, v4.15-rc6, v4.15-rc5, v4.15-rc4, v4.15-rc3, v4.15-rc2, v4.15-rc1 |
|
| #
15df03c8 |
| 17-Nov-2017 |
Davidlohr Bueso <[email protected]> |
sysvipc: make get_maxid O(1) again
For a custom microbenchmark on a 3.30GHz Xeon SandyBridge, which calls IPC_STAT over and over, it was calculated that, on avg the cost of ipc_get_maxid() for incre
sysvipc: make get_maxid O(1) again
For a custom microbenchmark on a 3.30GHz Xeon SandyBridge, which calls IPC_STAT over and over, it was calculated that, on avg the cost of ipc_get_maxid() for increasing amounts of keys was:
10 keys: ~900 cycles 100 keys: ~15000 cycles 1000 keys: ~150000 cycles 10000 keys: ~2100000 cycles
This is unsurprising as maxid is currently O(n).
By having the max_id available in O(1) we save all those cycles for each semctl(_STAT) command, the idr_find can be expensive -- which some real (customer) workloads actually poll on.
Note that this used to be the case, until commit 7ca7e564e04 ("ipc: store ipcs into IDRs"). The cost is the extra idr_find when doing RMIDs, but we simply go backwards, and should not take too many iterations to find the new value.
[[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
b8fd9983 |
| 17-Nov-2017 |
Davidlohr Bueso <[email protected]> |
sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE
Patch series "sysvipc: ipc-key management improvements".
Here are a few improvements I spotted while eyeballing Guillaume's rhashtable implemen
sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE
Patch series "sysvipc: ipc-key management improvements".
Here are a few improvements I spotted while eyeballing Guillaume's rhashtable implementation for ipc keys. The first and fourth patches are the interesting ones, the middle two are trivial.
This patch (of 4):
The next_id object-allocation functionality was introduced in commit 03f595668017 ("ipc: add sysctl to specify desired next object id").
Given that these new entries are _only_ exported under the CONFIG_CHECKPOINT_RESTORE option, there is no point for the common case to even know about ->next_id. As such rewrite ipc_buildid() such that it can do away with the field as well as unnecessary branches when adding a new identifier. The end result also better differentiates both cases, so the code ends up being cleaner; albeit the small duplications regarding the default case.
[[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.14, v4.14-rc8 |
|
| #
b2441318 |
| 01-Nov-2017 |
Greg Kroah-Hartman <[email protected]> |
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license identifiers to apply.
- when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary:
SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became the concluded license(s).
- when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time.
In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related.
Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches.
Reviewed-by: Kate Stewart <[email protected]> Reviewed-by: Philippe Ombredanne <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v4.14-rc7, v4.14-rc6, v4.14-rc5, v4.14-rc4, v4.14-rc3, v4.14-rc2, v4.14-rc1 |
|
| #
0cfb6aee |
| 08-Sep-2017 |
Guillaume Knispel <[email protected]> |
ipc: optimize semget/shmget/msgget for lots of keys
ipc_findkey() used to scan all objects to look for the wanted key. This is slow when using a high number of keys. This change adds an rhashtable
ipc: optimize semget/shmget/msgget for lots of keys
ipc_findkey() used to scan all objects to look for the wanted key. This is slow when using a high number of keys. This change adds an rhashtable of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n).
This change gives a 865% improvement of benchmark reaim.jobs_per_min on a 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1]
Other (more micro) benchmark results, by the author: On an i5 laptop, the following loop executed right after a reboot took, without and with this change:
for (int i = 0, k=0x424242; i < KEYS; ++i) semget(k++, 1, IPC_CREAT | 0600);
total total max single max single KEYS without with call without call with
1 3.5 4.9 µs 3.5 4.9 10 7.6 8.6 µs 3.7 4.7 32 16.2 15.9 µs 4.3 5.3 100 72.9 41.8 µs 3.7 4.7 1000 5,630.0 502.0 µs * * 10000 1,340,000.0 7,240.0 µs * * 31900 17,600,000.0 22,200.0 µs * *
*: unreliable measure: high variance
The duration for a lookup-only usage was obtained by the same loop once the keys are present:
total total max single max single KEYS without with call without call with
1 2.1 2.5 µs 2.1 2.5 10 4.5 4.8 µs 2.2 2.3 32 13.0 10.8 µs 2.3 2.8 100 82.9 25.1 µs * 2.3 1000 5,780.0 217.0 µs * * 10000 1,470,000.0 2,520.0 µs * * 31900 17,400,000.0 7,810.0 µs * *
Finally, executing each semget() in a new process gave, when still summing only the durations of these syscalls:
creation: total total KEYS without with
1 3.7 5.0 µs 10 32.9 36.7 µs 32 125.0 109.0 µs 100 523.0 353.0 µs 1000 20,300.0 3,280.0 µs 10000 2,470,000.0 46,700.0 µs 31900 27,800,000.0 219,000.0 µs
lookup-only: total total KEYS without with
1 2.5 2.7 µs 10 25.4 24.4 µs 32 106.0 72.6 µs 100 591.0 352.0 µs 1000 22,400.0 2,250.0 µs 10000 2,510,000.0 25,700.0 µs 31900 28,200,000.0 115,000.0 µs
[1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop
Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix Signed-off-by: Guillaume Knispel <[email protected]> Reviewed-by: Marc Pardo <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Kees Cook <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Andrey Vagin <[email protected]> Cc: Guillaume Knispel <[email protected]> Cc: Marc Pardo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
| #
a2e0602c |
| 08-Sep-2017 |
Elena Reshetova <[email protected]> |
ipc: convert ipc_namespace.count from atomic_t to refcount_t
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows t
ipc: convert ipc_namespace.count from atomic_t to refcount_t
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: David Windsor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.13, v4.13-rc7, v4.13-rc6, v4.13-rc5, v4.13-rc4, v4.13-rc3, v4.13-rc2, v4.13-rc1, v4.12, v4.12-rc7, v4.12-rc6, v4.12-rc5, v4.12-rc4, v4.12-rc3, v4.12-rc2, v4.12-rc1, v4.11, v4.11-rc8, v4.11-rc7, v4.11-rc6, v4.11-rc5, v4.11-rc4, v4.11-rc3, v4.11-rc2, v4.11-rc1, v4.10, v4.10-rc8, v4.10-rc7, v4.10-rc6, v4.10-rc5, v4.10-rc4, v4.10-rc3, v4.10-rc2, v4.10-rc1, v4.9, v4.9-rc8, v4.9-rc7, v4.9-rc6, v4.9-rc5, v4.9-rc4, v4.9-rc3 |
|
| #
3859a271 |
| 28-Oct-2016 |
Kees Cook <[email protected]> |
randstruct: Mark various structs for randomization
This marks many critical kernel structures for randomization. These are structures that have been targeted in the past in security exploits, or con
randstruct: Mark various structs for randomization
This marks many critical kernel structures for randomization. These are structures that have been targeted in the past in security exploits, or contain functions pointers, pointers to function pointer tables, lists, workqueues, ref-counters, credentials, permissions, or are otherwise sensitive. This initial list was extracted from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code.
Left out of this list is task_struct, which requires special handling and will be covered in a subsequent patch.
Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v4.9-rc2, v4.9-rc1, v4.8, v4.8-rc8, v4.8-rc7, v4.8-rc6, v4.8-rc5, v4.8-rc4, v4.8-rc3, v4.8-rc2 |
|
| #
aba35661 |
| 08-Aug-2016 |
Eric W. Biederman <[email protected]> |
ipcns: Add a limit on the number of ipc namespaces
Acked-by: Kees Cook <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
Revision tags: v4.8-rc1 |
|
| #
3bd080e4 |
| 02-Aug-2016 |
Alexey Dobriyan <[email protected]> |
ipc: delete "nr_ipc_ns"
Write-only variable.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <
ipc: delete "nr_ipc_ns"
Write-only variable.
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.7, v4.7-rc7, v4.7-rc6, v4.7-rc5, v4.7-rc4, v4.7-rc3, v4.7-rc2, v4.7-rc1, v4.6, v4.6-rc7, v4.6-rc6, v4.6-rc5, v4.6-rc4, v4.6-rc3, v4.6-rc2, v4.6-rc1, v4.5, v4.5-rc7, v4.5-rc6, v4.5-rc5, v4.5-rc4, v4.5-rc3, v4.5-rc2, v4.5-rc1, v4.4, v4.4-rc8, v4.4-rc7, v4.4-rc6, v4.4-rc5, v4.4-rc4, v4.4-rc3, v4.4-rc2, v4.4-rc1, v4.3, v4.3-rc7, v4.3-rc6, v4.3-rc5, v4.3-rc4, v4.3-rc3, v4.3-rc2, v4.3-rc1, v4.2, v4.2-rc8, v4.2-rc7, v4.2-rc6, v4.2-rc5, v4.2-rc4, v4.2-rc3, v4.2-rc2, v4.2-rc1, v4.1, v4.1-rc8, v4.1-rc7, v4.1-rc6, v4.1-rc5, v4.1-rc4, v4.1-rc3, v4.1-rc2, v4.1-rc1, v4.0, v4.0-rc7, v4.0-rc6, v4.0-rc5, v4.0-rc4, v4.0-rc3, v4.0-rc2, v4.0-rc1, v3.19, v3.19-rc7, v3.19-rc6, v3.19-rc5, v3.19-rc4, v3.19-rc3, v3.19-rc2, v3.19-rc1 |
|
| #
0050ee05 |
| 13-Dec-2014 |
Manfred Spraul <[email protected]> |
ipc/msg: increase MSGMNI, remove scaling
SysV can be abused to allocate locked kernel memory. For most systems, a small limit doesn't make sense, see the discussion with regards to SHMMAX.
Therefo
ipc/msg: increase MSGMNI, remove scaling
SysV can be abused to allocate locked kernel memory. For most systems, a small limit doesn't make sense, see the discussion with regards to SHMMAX.
Therefore: increase MSGMNI to the maximum supported.
And: If we ignore the risk of locking too much memory, then an automatic scaling of MSGMNI doesn't make sense. Therefore the logic can be removed.
The code preserves auto_msgmni to avoid breaking any user space applications that expect that the value exists.
Notes: 1) If an administrator must limit the memory allocations, then he can set MSGMNI as necessary.
Or he can disable sysv entirely (as e.g. done by Android).
2) MSGMAX and MSGMNB are intentionally not increased, as these values are used to control latency vs. throughput: If MSGMNB is large, then msgsnd() just returns and more messages can be queued before a task switch to a task that calls msgrcv() is forced.
[[email protected]: coding-style fixes] Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Rafael Aquini <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v3.18, v3.18-rc7, v3.18-rc6, v3.18-rc5, v3.18-rc4, v3.18-rc3 |
|
| #
435d5f4b |
| 01-Nov-2014 |
Al Viro <[email protected]> |
common object embedded into various struct ....ns
for now - just move corresponding ->proc_inum instances over there
Acked-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Al Viro <vi
common object embedded into various struct ....ns
for now - just move corresponding ->proc_inum instances over there
Acked-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Al Viro <[email protected]>
show more ...
|
|
Revision tags: v3.18-rc2, v3.18-rc1, v3.17, v3.17-rc7, v3.17-rc6, v3.17-rc5, v3.17-rc4, v3.17-rc3, v3.17-rc2, v3.17-rc1, v3.16, v3.16-rc7, v3.16-rc6, v3.16-rc5, v3.16-rc4, v3.16-rc3, v3.16-rc2, v3.16-rc1, v3.15, v3.15-rc8, v3.15-rc7, v3.15-rc6, v3.15-rc5, v3.15-rc4, v3.15-rc3, v3.15-rc2, v3.15-rc1, v3.14, v3.14-rc8, v3.14-rc7, v3.14-rc6, v3.14-rc5 |
|
| #
f3713fd9 |
| 25-Feb-2014 |
Davidlohr Bueso <[email protected]> |
ipc,mqueue: remove limits for the amount of system-wide queues
Commit 93e6f119c0ce ("ipc/mqueue: cleanup definition names and locations") added global hardcoded limits to the amount of message queue
ipc,mqueue: remove limits for the amount of system-wide queues
Commit 93e6f119c0ce ("ipc/mqueue: cleanup definition names and locations") added global hardcoded limits to the amount of message queues that can be created. While these limits are per-namespace, reality is that it ends up breaking userspace applications. Historically users have, at least in theory, been able to create up to INT_MAX queues, and limiting it to just 1024 is way too low and dramatic for some workloads and use cases. For instance, Madars reports:
"This update imposes bad limits on our multi-process application. As our app uses approaches that each process opens its own set of queues (usually something about 3-5 queues per process). In some scenarios we might run up to 3000 processes or more (which of-course for linux is not a problem). Thus we might need up to 9000 queues or more. All processes run under one user."
Other affected users can be found in launchpad bug #1155695: https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695
Instead of increasing this limit, revert it entirely and fallback to the original way of dealing queue limits -- where once a user's resource limit is reached, and all memory is used, new queues cannot be created.
Signed-off-by: Davidlohr Bueso <[email protected]> Reported-by: Madars Vitolins <[email protected]> Acked-by: Doug Ledford <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: <[email protected]> [3.5+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v3.14-rc4, v3.14-rc3, v3.14-rc2, v3.14-rc1 |
|
| #
daf948c7 |
| 28-Jan-2014 |
Davidlohr Bueso <[email protected]> |
ipc: delete seq_max field in struct ipc_ids
This field is only used to reset the ids seq number if it exceeds the smaller of INT_MAX/SEQ_MULTIPLIER and USHRT_MAX, and can therefore be moved out of t
ipc: delete seq_max field in struct ipc_ids
This field is only used to reset the ids seq number if it exceeds the smaller of INT_MAX/SEQ_MULTIPLIER and USHRT_MAX, and can therefore be moved out of the structure and into its own macro. Since each ipc_namespace contains a table of 3 pointers to struct ipc_ids we can save space in instruction text:
text data bss dec hex filename 56232 2348 24 58604 e4ec ipc/built-in.o 56216 2348 24 58588 e4dc ipc/built-in.o-after
Signed-off-by: Davidlohr Bueso <[email protected]> Reviewed-by: Jonathan Gonzalez <[email protected]> Cc: Aswin Chandramouleeswaran <[email protected]> Cc: Rik van Riel <[email protected]> Acked-by: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v3.13, v3.13-rc8, v3.13-rc7, v3.13-rc6, v3.13-rc5, v3.13-rc4, v3.13-rc3, v3.13-rc2, v3.13-rc1, v3.12 |
|
| #
9bf76ca3 |
| 03-Nov-2013 |
Mathias Krause <[email protected]> |
ipc, msg: forbid negative values for "msg{max,mnb,mni}"
Negative message lengths make no sense -- so don't do negative queue lenghts or identifier counts. Prevent them from getting negative.
Also c
ipc, msg: forbid negative values for "msg{max,mnb,mni}"
Negative message lengths make no sense -- so don't do negative queue lenghts or identifier counts. Prevent them from getting negative.
Also change the underlying data types to be unsigned to avoid hairy surprises with sign extensions in cases where those variables get evaluated in unsigned expressions with bigger data types, e.g size_t.
In case a user still wants to have "unlimited" sizes she could just use INT_MAX instead.
Signed-off-by: Mathias Krause <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v3.12-rc7, v3.12-rc6, v3.12-rc5, v3.12-rc4, v3.12-rc3, v3.12-rc2, v3.12-rc1 |
|
| #
d9a605e4 |
| 11-Sep-2013 |
Davidlohr Bueso <[email protected]> |
ipc: rename ids->rw_mutex
Since in some situations the lock can be shared for readers, we shouldn't be calling it a mutex, rename it to rwsem.
Signed-off-by: Davidlohr Bueso <[email protected]
ipc: rename ids->rw_mutex
Since in some situations the lock can be shared for readers, we shouldn't be calling it a mutex, rename it to rwsem.
Signed-off-by: Davidlohr Bueso <[email protected]> Tested-by: Sedat Dilek <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v3.11, v3.11-rc7, v3.11-rc6, v3.11-rc5, v3.11-rc4, v3.11-rc3, v3.11-rc2, v3.11-rc1, v3.10, v3.10-rc7, v3.10-rc6, v3.10-rc5, v3.10-rc4, v3.10-rc3, v3.10-rc2, v3.10-rc1 |
|
| #
d69f3bad |
| 01-May-2013 |
Robin Holt <[email protected]> |
ipc: sysv shared memory limited to 8TiB
Trying to run an application which was trying to put data into half of memory using shmget(), we found that having a shmall value below 8EiB-8TiB would preven
ipc: sysv shared memory limited to 8TiB
Trying to run an application which was trying to put data into half of memory using shmget(), we found that having a shmall value below 8EiB-8TiB would prevent us from using anything more than 8TiB. By setting kernel.shmall greater than 8EiB-8TiB would make the job work.
In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.
ipc/shm.c: 458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params) 459 { ... 465 int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT; ... 474 if (ns->shm_tot + numpages > ns->shm_ctlall) 475 return -ENOSPC;
[[email protected]: make ipc/shm.c:newseg()'s numpages size_t, not int] Signed-off-by: Robin Holt <[email protected]> Reported-by: Alex Thorlton <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|