mergemem

[ Description | Status | Downloading | ChangeLog | ToDo | Links | Feedback ]

Description

What is mergemem?

Since am not able to point it out in such a clear manner, I just put here a message from the kernel mailing list.
> > The logic of mergemem is
> > that if you start several instance of a given program, there is a good bet
> > that some memory will be initialised exactly the same. So mergement is
> > comparing different instance of a program and find all the identical
> > pages. Then it puts those page read-only and merge all process to share
> > the same physical page. It also puts the page with the copy-on-write
> > flag so a process can still continue to modify the page, if needed later.
> 
> Just curious, but couldn't one load a process with the data section shared
> and copy-on-write the instant a program is loaded? Isn't it true that the
> data section can be paged out of the executable file until it is written to?
> (Zero-filled pages are initially shared anyhow.)
> 
> I'm a little surprised that Linux doesn't do this, or does it? And if not,
> would doing it get some of the benefits of mergemem without the runtime
> overhead?

Linux does all this. The mergemem gadget goes much further. The idea is
that most program starts and then initialise various stuff. At this point,
the share pages are not shared anymore since the program have written in
them. So the data have been loaded of the executable and modified.

The end result is that if you start two instance of the same
program, it will perform basically the same initialisation,
creating a set of duplicate "modified" pages. The OS can't know
that easily. Here is an example.

        int main ()
        {
                // Create a special table based on the hostname
                char table[10000];
                for (i=0; i<10000; i++){
                        ...
                }
                // Then from now on, use that table unmodified
                // as a lookup for example
        }

In this example, each instance of the program is initialising a 10k table
with the exact same data, which is not fixed.

The mergemem patch walks the page allocated to all instance of a program
and find the one which are identical. It then remove the duplicates and
point all process to the same "write-protect with copy-on-write" page.

For example, I assume that if you have a text editor and load a 10 meg
document, then start another copy of the text editor and load the same
document, the mergemem patch will merge most of this 10 meg. now, the
minute you start editing the document in one editor, the page will start
to differentiate again.

The end result is about the same as if the original text editor had done a
fork().

But the idea of mergemem is that most program start and perform some
initialisation and all instance of the program share some amount of this
initialisation. This is "this amount" that mergemem can find.

The question is "is it worth it". From the date I have seen, it sounds
like it might be very useful on a multi-users server. Time will tell.
Also the paper for the 6th International Linux-Kongress has a lot of interesting information.
Status
We think it is perfectly stable (in terms of machine craches).
Downloading

You can download mergemem from our webserver.

You can also obtain the sources via CVS. To do this you have to set CVSROOT to :pserver:anonymous@das.ist.org:/cvshome/mergemem. Type 'cvs login' and ENTER, then 'cvs checkout mergemem'.
ChangeLog
Version 0.16
~ Development basen on Linux 2.2.7 (on Alpha) and 2.2.9 (on i386)
+ SMP Support.
  (Thank to Matthew Hunter  for testing it)
+ New mergemem:
  If you are a mergemem user, you want to use the mergemem from the
  *_old directorys. (This is also the mergemem which is installed
  by "make install)
  Developers want to look at the mergemem in the mergemem directory,
  wich has the new features:
    * based on mmlib
    * checksumms are calculated in userspace
    * chscksum and search algorithms are pluggable modules.
+ First heuristic for the new mergemem called "equalv".
+ 4 Checksums
    * addrot   (as in the kernel chksum ioctl)
    * crc32    (from gzip)
    * const    (world's worst checksum)
    * scl      (faster and better than addrot)
~ Fixed mm_profile. ( It was broken since 2.2.2 or so)

Version 0.15
~ All further development based on Linux 2.2.1
- Droppped mergelib, since it is obsoleted my mmlib.
+ provide_pages IOCTL implemented, this is mapping target pages to the
  user's address space (mmlib application) thus allowing user-level
  hash-functions to be as faster than kernel-level hash-functions.
  (Sice provide_pages can process an arbitrary number of pages at a time
  (mapping to user's address space is not an expensive operation) while
  genchksum can only process one at a time (calculating checksums _is_
  expensive in terms of cpu cycles))
  (PS: It took 3 whole days to iron out a bug related to linux's swapping
  mechanism. I really believe that it works now! Tested on Alpha & i386.)
~ Updated mmlib's implementation of m_hashpages to make use of the new
  provide_pages IOCTL. (For now all checksumes are calculated in user space)
~ Added "-ffixed-8" to the compiler flags on Alpha.
~ Added "If v_addr is NULL, the entry is ignored." to the m_hashpages section
  of the mmlib(3) manpage.
 
Version 0.14
~ mergemod_2?.c unified into mergemod.c.
+ get_pageinfos IOCTL implemented. 
~ mmlib based on new IOCTLs plus emulation code for the not jet
  implemented IOCTLs.
~ Changes to mmlib.3 
   (A lot of thanks to Ulrich Neumerkel, the driving force behind mmlib)
~ permissions for /dev/mergemem are now 666, and the module ensures, the
  same security as ptrace(2) (and /proc/[0-9]*/mem).

Version 0.13
+ mmlib, manualpage and reference implementation by Ulrich Neumerkel,
  with this mergelib is obsolete.
~ Interrupt dissabling reactivated (it should work on SMP systems for now)
~ mm_profile is not build by default (see mm_profile.README for details)     

[...]
ToDo
  • make it run on power pc
  • make it run on smp machines
  • think about all concerns of security
  • make a stable interface (via a library)
Links
Linux memory management patches page
mergemem page of our faculty advisor
Feedback
Please direct any feedback to us:
[ Description | Status | Downloading | ChangeLog | ToDo | Links | Feedback ]



Last modified: Thu Feb 18 12:17:24 CET 1999