forked from Minki/linux
493b0e9d94
/proc/pid/smaps_rollup is a new proc file that improves the performance of user programs that determine aggregate memory statistics (e.g., total PSS) of a process. Android regularly "samples" the memory usage of various processes in order to balance its memory pool sizes. This sampling process involves opening /proc/pid/smaps and summing certain fields. For very large processes, sampling memory use this way can take several hundred milliseconds, due mostly to the overhead of the seq_printf calls in task_mmu.c. smaps_rollup improves the situation. It contains most of the fields of /proc/pid/smaps, but instead of a set of fields for each VMA, smaps_rollup instead contains one synthetic smaps-format entry representing the whole process. In the single smaps_rollup synthetic entry, each field is the summation of the corresponding field in all of the real-smaps VMAs. Using a common format for smaps_rollup and smaps allows userspace parsers to repurpose parsers meant for use with non-rollup smaps for smaps_rollup, and it allows userspace to switch between smaps_rollup and smaps at runtime (say, based on the availability of smaps_rollup in a given kernel) with minimal fuss. By using smaps_rollup instead of smaps, a caller can avoid the significant overhead of formatting, reading, and parsing each of a large process's potentially very numerous memory mappings. For sampling system_server's PSS in Android, we measured a 12x speedup, representing a savings of several hundred milliseconds. One alternative to a new per-process proc file would have been including PSS information in /proc/pid/status. We considered this option but thought that PSS would be too expensive (by a few orders of magnitude) to collect relative to what's already emitted as part of /proc/pid/status, and slowing every user of /proc/pid/status for the sake of readers that happen to want PSS feels wrong. The code itself works by reusing the existing VMA-walking framework we use for regular smaps generation and keeping the mem_size_stats structure around between VMA walks instead of using a fresh one for each VMA. In this way, summation happens automatically. We let seq_file walk over the VMAs just as it does for regular smaps and just emit nothing to the seq_file until we hit the last VMA. Benchmarks: using smaps: iterations:1000 pid:1163 pss:220023808 0m29.46s real 0m08.28s user 0m20.98s system using smaps_rollup: iterations:1000 pid:1163 pss:220702720 0m04.39s real 0m00.03s user 0m04.31s system We're using the PSS samples we collect asynchronously for system-management tasks like fine-tuning oom_adj_score, memory use tracking for debugging, application-level memory-use attribution, and deciding whether we want to kill large processes during system idle maintenance windows. Android has been using PSS for these purposes for a long time; as the average process VMA count has increased and and devices become more efficiency-conscious, PSS-collection inefficiency has started to matter more. IMHO, it'd be a lot safer to optimize the existing PSS-collection model, which has been fine-tuned over the years, instead of changing the memory tracking approach entirely to work around smaps-generation inefficiency. Tim said: : There are two main reasons why Android gathers PSS information: : : 1. Android devices can show the user the amount of memory used per : application via the settings app. This is a less important use case. : : 2. We log PSS to help identify leaks in applications. We have found : an enormous number of bugs (in the Android platform, in Google's own : apps, and in third-party applications) using this data. : : To do this, system_server (the main process in Android userspace) will : sample the PSS of a process three seconds after it changes state (for : example, app is launched and becomes the foreground application) and about : every ten minutes after that. The net result is that PSS collection is : regularly running on at least one process in the system (usually a few : times a minute while the screen is on, less when screen is off due to : suspend). PSS of a process is an incredibly useful stat to track, and we : aren't going to get rid of it. We've looked at some very hacky approaches : using RSS ("take the RSS of the target process, subtract the RSS of the : zygote process that is the parent of all Android apps") to reduce the : accounting time, but it regularly overestimated the memory used by 20+ : percent. Accordingly, I don't think that there's a good alternative to : using PSS. : : We started looking into PSS collection performance after we noticed random : frequency spikes while a phone's screen was off; occasionally, one of the : CPU clusters would ramp to a high frequency because there was 200-300ms of : constant CPU work from a single thread in the main Android userspace : process. The work causing the spike (which is reasonable governor : behavior given the amount of CPU time needed) was always PSS collection. : As a result, Android is burning more power than we should be on PSS : collection. : : The other issue (and why I'm less sure about improving smaps as a : long-term solution) is that the number of VMAs per process has increased : significantly from release to release. After trying to figure out why we : were seeing these 200-300ms PSS collection times on Android O but had not : noticed it in previous versions, we found that the number of VMAs in the : main system process increased by 50% from Android N to Android O (from : ~1800 to ~2700) and varying increases in every userspace process. Android : M to N also had an increase in the number of VMAs, although not as much. : I'm not sure why this is increasing so much over time, but thinking about : ASLR and ways to make ASLR better, I expect that this will continue to : increase going forward. I would not be surprised if we hit 5000 VMAs on : the main Android process (system_server) by 2020. : : If we assume that the number of VMAs is going to increase over time, then : doing anything we can do to reduce the overhead of each VMA during PSS : collection seems like the right way to go, and that means outputting an : aggregate statistic (to avoid whatever overhead there is per line in : writing smaps and in reading each line from userspace). Link: http://lkml.kernel.org/r/20170812022148.178293-1-dancol@google.com Signed-off-by: Daniel Colascione <dancol@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Sonny Rao <sonnyrao@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
303 lines
8.4 KiB
C
303 lines
8.4 KiB
C
/* Internal procfs definitions
|
|
*
|
|
* Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
|
|
* Written by David Howells (dhowells@redhat.com)
|
|
*
|
|
* This program is free software; you can redistribute it and/or
|
|
* modify it under the terms of the GNU General Public License
|
|
* as published by the Free Software Foundation; either version
|
|
* 2 of the License, or (at your option) any later version.
|
|
*/
|
|
|
|
#include <linux/proc_fs.h>
|
|
#include <linux/proc_ns.h>
|
|
#include <linux/spinlock.h>
|
|
#include <linux/atomic.h>
|
|
#include <linux/binfmts.h>
|
|
#include <linux/sched/coredump.h>
|
|
#include <linux/sched/task.h>
|
|
|
|
struct ctl_table_header;
|
|
struct mempolicy;
|
|
|
|
/*
|
|
* This is not completely implemented yet. The idea is to
|
|
* create an in-memory tree (like the actual /proc filesystem
|
|
* tree) of these proc_dir_entries, so that we can dynamically
|
|
* add new files to /proc.
|
|
*
|
|
* parent/subdir are used for the directory structure (every /proc file has a
|
|
* parent, but "subdir" is empty for all non-directory entries).
|
|
* subdir_node is used to build the rb tree "subdir" of the parent.
|
|
*/
|
|
struct proc_dir_entry {
|
|
unsigned int low_ino;
|
|
umode_t mode;
|
|
nlink_t nlink;
|
|
kuid_t uid;
|
|
kgid_t gid;
|
|
loff_t size;
|
|
const struct inode_operations *proc_iops;
|
|
const struct file_operations *proc_fops;
|
|
struct proc_dir_entry *parent;
|
|
struct rb_root subdir;
|
|
struct rb_node subdir_node;
|
|
void *data;
|
|
atomic_t count; /* use count */
|
|
atomic_t in_use; /* number of callers into module in progress; */
|
|
/* negative -> it's going away RSN */
|
|
struct completion *pde_unload_completion;
|
|
struct list_head pde_openers; /* who did ->open, but not ->release */
|
|
spinlock_t pde_unload_lock; /* proc_fops checks and pde_users bumps */
|
|
u8 namelen;
|
|
char name[];
|
|
} __randomize_layout;
|
|
|
|
union proc_op {
|
|
int (*proc_get_link)(struct dentry *, struct path *);
|
|
int (*proc_show)(struct seq_file *m,
|
|
struct pid_namespace *ns, struct pid *pid,
|
|
struct task_struct *task);
|
|
};
|
|
|
|
struct proc_inode {
|
|
struct pid *pid;
|
|
unsigned int fd;
|
|
union proc_op op;
|
|
struct proc_dir_entry *pde;
|
|
struct ctl_table_header *sysctl;
|
|
struct ctl_table *sysctl_entry;
|
|
struct hlist_node sysctl_inodes;
|
|
const struct proc_ns_operations *ns_ops;
|
|
struct inode vfs_inode;
|
|
} __randomize_layout;
|
|
|
|
/*
|
|
* General functions
|
|
*/
|
|
static inline struct proc_inode *PROC_I(const struct inode *inode)
|
|
{
|
|
return container_of(inode, struct proc_inode, vfs_inode);
|
|
}
|
|
|
|
static inline struct proc_dir_entry *PDE(const struct inode *inode)
|
|
{
|
|
return PROC_I(inode)->pde;
|
|
}
|
|
|
|
static inline void *__PDE_DATA(const struct inode *inode)
|
|
{
|
|
return PDE(inode)->data;
|
|
}
|
|
|
|
static inline struct pid *proc_pid(struct inode *inode)
|
|
{
|
|
return PROC_I(inode)->pid;
|
|
}
|
|
|
|
static inline struct task_struct *get_proc_task(struct inode *inode)
|
|
{
|
|
return get_pid_task(proc_pid(inode), PIDTYPE_PID);
|
|
}
|
|
|
|
void task_dump_owner(struct task_struct *task, mode_t mode,
|
|
kuid_t *ruid, kgid_t *rgid);
|
|
|
|
static inline unsigned name_to_int(const struct qstr *qstr)
|
|
{
|
|
const char *name = qstr->name;
|
|
int len = qstr->len;
|
|
unsigned n = 0;
|
|
|
|
if (len > 1 && *name == '0')
|
|
goto out;
|
|
while (len-- > 0) {
|
|
unsigned c = *name++ - '0';
|
|
if (c > 9)
|
|
goto out;
|
|
if (n >= (~0U-9)/10)
|
|
goto out;
|
|
n *= 10;
|
|
n += c;
|
|
}
|
|
return n;
|
|
out:
|
|
return ~0U;
|
|
}
|
|
|
|
/*
|
|
* Offset of the first process in the /proc root directory..
|
|
*/
|
|
#define FIRST_PROCESS_ENTRY 256
|
|
|
|
/* Worst case buffer size needed for holding an integer. */
|
|
#define PROC_NUMBUF 13
|
|
|
|
/*
|
|
* array.c
|
|
*/
|
|
extern const struct file_operations proc_tid_children_operations;
|
|
|
|
extern int proc_tid_stat(struct seq_file *, struct pid_namespace *,
|
|
struct pid *, struct task_struct *);
|
|
extern int proc_tgid_stat(struct seq_file *, struct pid_namespace *,
|
|
struct pid *, struct task_struct *);
|
|
extern int proc_pid_status(struct seq_file *, struct pid_namespace *,
|
|
struct pid *, struct task_struct *);
|
|
extern int proc_pid_statm(struct seq_file *, struct pid_namespace *,
|
|
struct pid *, struct task_struct *);
|
|
|
|
/*
|
|
* base.c
|
|
*/
|
|
extern const struct dentry_operations pid_dentry_operations;
|
|
extern int pid_getattr(const struct path *, struct kstat *, u32, unsigned int);
|
|
extern int proc_setattr(struct dentry *, struct iattr *);
|
|
extern struct inode *proc_pid_make_inode(struct super_block *, struct task_struct *, umode_t);
|
|
extern int pid_revalidate(struct dentry *, unsigned int);
|
|
extern int pid_delete_dentry(const struct dentry *);
|
|
extern int proc_pid_readdir(struct file *, struct dir_context *);
|
|
extern struct dentry *proc_pid_lookup(struct inode *, struct dentry *, unsigned int);
|
|
extern loff_t mem_lseek(struct file *, loff_t, int);
|
|
|
|
/* Lookups */
|
|
typedef int instantiate_t(struct inode *, struct dentry *,
|
|
struct task_struct *, const void *);
|
|
extern bool proc_fill_cache(struct file *, struct dir_context *, const char *, int,
|
|
instantiate_t, struct task_struct *, const void *);
|
|
|
|
/*
|
|
* generic.c
|
|
*/
|
|
extern struct dentry *proc_lookup(struct inode *, struct dentry *, unsigned int);
|
|
extern struct dentry *proc_lookup_de(struct proc_dir_entry *, struct inode *,
|
|
struct dentry *);
|
|
extern int proc_readdir(struct file *, struct dir_context *);
|
|
extern int proc_readdir_de(struct proc_dir_entry *, struct file *, struct dir_context *);
|
|
|
|
static inline struct proc_dir_entry *pde_get(struct proc_dir_entry *pde)
|
|
{
|
|
atomic_inc(&pde->count);
|
|
return pde;
|
|
}
|
|
extern void pde_put(struct proc_dir_entry *);
|
|
|
|
static inline bool is_empty_pde(const struct proc_dir_entry *pde)
|
|
{
|
|
return S_ISDIR(pde->mode) && !pde->proc_iops;
|
|
}
|
|
|
|
/*
|
|
* inode.c
|
|
*/
|
|
struct pde_opener {
|
|
struct file *file;
|
|
struct list_head lh;
|
|
bool closing;
|
|
struct completion *c;
|
|
};
|
|
extern const struct inode_operations proc_link_inode_operations;
|
|
|
|
extern const struct inode_operations proc_pid_link_inode_operations;
|
|
|
|
extern void proc_init_inodecache(void);
|
|
void set_proc_pid_nlink(void);
|
|
extern struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *);
|
|
extern int proc_fill_super(struct super_block *, void *data, int flags);
|
|
extern void proc_entry_rundown(struct proc_dir_entry *);
|
|
|
|
/*
|
|
* proc_namespaces.c
|
|
*/
|
|
extern const struct inode_operations proc_ns_dir_inode_operations;
|
|
extern const struct file_operations proc_ns_dir_operations;
|
|
|
|
/*
|
|
* proc_net.c
|
|
*/
|
|
extern const struct file_operations proc_net_operations;
|
|
extern const struct inode_operations proc_net_inode_operations;
|
|
|
|
#ifdef CONFIG_NET
|
|
extern int proc_net_init(void);
|
|
#else
|
|
static inline int proc_net_init(void) { return 0; }
|
|
#endif
|
|
|
|
/*
|
|
* proc_self.c
|
|
*/
|
|
extern int proc_setup_self(struct super_block *);
|
|
|
|
/*
|
|
* proc_thread_self.c
|
|
*/
|
|
extern int proc_setup_thread_self(struct super_block *);
|
|
extern void proc_thread_self_init(void);
|
|
|
|
/*
|
|
* proc_sysctl.c
|
|
*/
|
|
#ifdef CONFIG_PROC_SYSCTL
|
|
extern int proc_sys_init(void);
|
|
extern void proc_sys_evict_inode(struct inode *inode,
|
|
struct ctl_table_header *head);
|
|
#else
|
|
static inline void proc_sys_init(void) { }
|
|
static inline void proc_sys_evict_inode(struct inode *inode,
|
|
struct ctl_table_header *head) { }
|
|
#endif
|
|
|
|
/*
|
|
* proc_tty.c
|
|
*/
|
|
#ifdef CONFIG_TTY
|
|
extern void proc_tty_init(void);
|
|
#else
|
|
static inline void proc_tty_init(void) {}
|
|
#endif
|
|
|
|
/*
|
|
* root.c
|
|
*/
|
|
extern struct proc_dir_entry proc_root;
|
|
extern int proc_parse_options(char *options, struct pid_namespace *pid);
|
|
|
|
extern void proc_self_init(void);
|
|
extern int proc_remount(struct super_block *, int *, char *);
|
|
|
|
/*
|
|
* task_[no]mmu.c
|
|
*/
|
|
struct mem_size_stats;
|
|
struct proc_maps_private {
|
|
struct inode *inode;
|
|
struct task_struct *task;
|
|
struct mm_struct *mm;
|
|
struct mem_size_stats *rollup;
|
|
#ifdef CONFIG_MMU
|
|
struct vm_area_struct *tail_vma;
|
|
#endif
|
|
#ifdef CONFIG_NUMA
|
|
struct mempolicy *task_mempolicy;
|
|
#endif
|
|
} __randomize_layout;
|
|
|
|
struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
|
|
|
|
extern const struct file_operations proc_pid_maps_operations;
|
|
extern const struct file_operations proc_tid_maps_operations;
|
|
extern const struct file_operations proc_pid_numa_maps_operations;
|
|
extern const struct file_operations proc_tid_numa_maps_operations;
|
|
extern const struct file_operations proc_pid_smaps_operations;
|
|
extern const struct file_operations proc_pid_smaps_rollup_operations;
|
|
extern const struct file_operations proc_tid_smaps_operations;
|
|
extern const struct file_operations proc_clear_refs_operations;
|
|
extern const struct file_operations proc_pagemap_operations;
|
|
|
|
extern unsigned long task_vsize(struct mm_struct *);
|
|
extern unsigned long task_statm(struct mm_struct *,
|
|
unsigned long *, unsigned long *,
|
|
unsigned long *, unsigned long *);
|
|
extern void task_mem(struct seq_file *, struct mm_struct *);
|