From raj@cfa272.harvard.edu Mon Oct 24 08:42:37 EDT 1994 Article: 18947 of comp.os.linux.development Newsgroups: comp.os.linux.development Path: bigblue.oit.unc.edu!concert!gatech!newsxfer.itd.umich.edu!ncar!hsdndev!cfanews!cfanews.harvard.edu!raj From: raj@cfa272.harvard.edu (Raj Manandhar) Subject: Alpha patch to control Cyrix 486DLC or SLC cache (Adaptec AHA-154x only, for now) Message-ID: Lines: 327 Sender: news@cfanews.harvard.edu Organization: Harvard-Smithsonian Center for Astrophysics, Cambridge, Massachusetts, USA Date: Sun, 23 Oct 1994 23:43:09 GMT Hi all, I hacked my kernel so that I could get better performance from my Cyrix 486DLC-33, by controlling the chip's internal cache in software (using the Linux kernel). I'm posting what I did since it may help others. I would be interested in feedback, including being told that someone else already did this :-). Please respond by email, as I don't normally read this group, but will summarize. If it seems to work well, I hope it can be merged with the CxPatch stuff (see below) and announced more generally (I assume readers of this group are better equipped to deal with alpha stuff than most, so I haven't posted it elsewhere yet). I have a 386 motherboard whose external cache is unusable, I think because the chipset is too brain-dead to deal with busmastering DMA boards such as those in the Adaptec 154x family. I hacked my kernel rather than buying a new motherboard because my motherboard has 16 SIMM slots, and I'm using most of them (12 MB is much better than 8 MB with X/gcc/emacs). Another approach would be to buy a Cyrix 486DRx2, but they're fairly expensive, and I'd rather save my money for a real 486. (I got my 486DLC-33 for $39.) My Adaptec 1542C is the only DMA peripheral (other than the floppy) which I have, so it's basically the only module I had to mess with. I think doing something similar with other peripherals wouldn't be very hard. I would guess that busmastering DMA boards are the main ones that cause trouble with the external cache on older boards like mine, though I don't really know. 1. Prerequisites A Cyrix 486DLC or 486SLC chip and an Adaptec 154x SCSI board. Ability to live without any other peripherals that use DMA, at least long enough to try this patch. The floppy drive and some sound cards come to mind, and perhaps there are some network cards too. I might do an equivalent hack for the floppy, the next time I have occasion to use it :-). 2. You may not want this patch If you have a motherboard with working external cache, or one that supports the Cyrix 486DLC directly, or you have a 486DRx2 or 486SRx2, then this patch is probably of little interest to you. If you have a non-Cyrix chip, like an IBM 486DLC/SLC, or you have a Cyrix 486S/D/SX/DX (which are not 386-pin-compatible), then this patch could well be actively dangerous. I believe that TI chips are really Cyrix chips but with a different name on them, but I'm not sure. I know that TI actually manufactures at least some of Cyrix's chips, and my chip was sold to me as a "Cyrix/TI" chip, for what it's worth. 3. The basic idea (this information is from clau@acs.ucalgary.ca's CxPatch README.1st) 386 motherboards (and perhaps some really poor-quality Cyrix 486DLC motherboards) don't have logic on them to control the internal cache on the Cyrix 486DLC. When there is a DMA transfer, the memory cache has to be flushed, or else the peripheral and the CPU will have different ideas about what is in memory. The only straightforward way to do this if you have a 386 motherboard is to enable the BARB input to the CPU (BARB being asserted when there is DMA). Unfortunately, DRAM refreshes also assert BARB, so the cache is flushed continuously. Even with the bogoboost option in CxPatch I found I got horrible performance. The patches below disable the BARB input in boot/setup.S. When it looks like there's going to be DMA (in this case, when queuing a SCSI command in drivers/scsi/aha1542.c), I reenable BARB, and then disable it again after the command finishes. Thus, except perhaps during heavy disk access, the cache is rarely flushed, greatly improving the cache performance. 4. My results I started off with a 386SX-25 with 32K external cache, which gave me 4.08 bogomips (on good days). Then I switched to a 386DX-33 with a non-working external cache (and no internal cache of course), which gave me 5.37 bogomips. I bought a 486DLC-33, popped it into the motherboard, and got 7.16 bogomips (with the internal cache turned off). It was slightly faster than the 386, but not by much. I applied the CxPatch (more on that below), which turned on my internal cache, but continuously flushed it with the BARB input, and I got 7.29 bogomips. I was not very happy with this rather minute speed increase, and so I hacked together the patch below. Now I get 13.10 bogomips. I haven't tried any benchmarks yet, but the computer definitely seems faster. 5. Instructions First, this disclaimer: This patch works for me on my system. However, it hasn't been tried on any other system, so it is very much alpha (or even pre-alpha). Therefore, if you try it, you do so entirely at your own risk. I am not responsible for any damage that it may do to your system. In particular, it could trash your disk or even (who knows) damage your CPU. a. Get 'CxPatch' and apply it, to turn your CPU cache on. I got my copy from sunsite (according to the LSM, ftp://sunsite.unc.edu/pub/Linux/kernel/misc-patches/CxPatch030.tar.gz). The README.1st file has useful information if you're curious about how all this works. Make sure you read and understand the disclaimer in that package, as well as mine. b. Apply the patch below. I'm still running version 1.1.8 of the kernel, which is somewhat antique, so unless you have the same version you'll probably have to apply it partly by hand. If you fix the patch to work with the latest kernel, let me know. c. Run make config, answering yes to the question about software cache control, then recompile the kernel. d. Make sure you have read and understood my disclaimer and the prerequisites and anti-prerequisites at the beginning of this message. e. Back up your system, making sure that you also have a boot/rootdisk combination in case the worst happens. (You should have one anyway. I use the ones that came with my Slackware installation, personally.) f. Save a copy of your old kernel, then install the new one that you just compiled. Make sure the copy of the old kernel is known to Lilo (in /etc/lilo.conf on my system). Even if the new kernel works perfectly, you'll need the old kernel if you want to use any other DMA peripherals, like the floppy drive. My makefile overwrites /vmlinuz.old. If yours does something like that too, name the old kernel something else. g. Then reboot with the new kernel. Your bogomips should have gone up, hopefully by a lot. Remember, don't try to use any DMA peripherals besides your Adaptec 154x SCSI bus. I renamed my /dev/fd0* to be on the safe side. If you need to use other peripherals, reboot with your old kernel. It's possible that because of an oversight on my part, the code will eventually forget to turn the cache on. If you suspect this is the case, I would be interested to know (though I can't promise to do anything about it). You can check by clipping out the bogo.c module from the README.ARGGH which comes with 'CxPatch', assuming your kernel can deal with loadable modules. Incidentally, I find that this gives me 11 rather than 13 bogomips for some reason. The patch follows. Raj Manandhar (617) 495-8348 or -2038 raj@cfa.harvard.edu Harvard-Smithsonian Center for Astrophysics ...!harvard!cfa!raj 60 Garden Street, Cambridge, MA 02138 cfa::raj, raj@cfa.bitnet My URL is http://hea-www.harvard.edu/~raj/home.html *** boot/setup.S.old Sun Oct 23 16:32:47 1994 --- boot/setup.S Sun Oct 23 16:00:20 1994 *************** *** 107,115 **** --- 107,123 ---- out #0x22,al ! CCR0 #ifdef CONFIG_CYRIX_DLC # ifdef CONFIG_CYRIX_RISKY + # ifdef CONFIG_CYRIX_SOFTWARE_TOGGLE + mov al,#0x02 ! enable NC1 (may not be 100% DMA safe) + # else mov al,#0x22 ! enable #BARB/NC1 (may not be 100% DMA safe) + # endif # else + # ifdef CONFIG_CYRIX_SOFTWARE_TOGGLE + mov al,#0x03 ! enable NC0/NC1 for DLC + # else mov al,#0x23 ! enable #BARB/NC0/NC1 for DLC + # endif # endif #else mov al,#0x12 ! enable #FLUSH/NC1 for DRX *** drivers/scsi/aha1542.c.old Sun Oct 23 15:56:19 1994 --- drivers/scsi/aha1542.c Sun Oct 23 14:54:20 1994 *************** *** 75,80 **** --- 75,88 ---- #define WAITnexttimeout 3000000 static void setup_mailboxes(int base_io, struct Scsi_Host * shpnt); + #define CONFIG_CYRIX_SOFTWARE_TOGGLE + #ifdef CONFIG_CYRIX_SOFTWARE_TOGGLE + static void setup_cache(Scsi_Cmnd *SCpnt, void (*done)(Scsi_Cmnd *)); + static void done_wrapper(Scsi_Cmnd *SCpnt); + static void cyrix_cache_toggle(int flag); + # define cyrix_cache_disable() cyrix_cache_toggle(0) /* note -- call these */ + # define cyrix_cache_enable() cyrix_cache_toggle(1) /* two with ints off */ + #endif #define aha1542_intr_reset(base) outb(IRST, CONTROL(base)) *************** *** 560,566 **** --- 568,583 ---- if (done) { DEB(printk("aha1542_queuecommand: now waiting for interrupt "); aha1542_stat()); + #ifdef CONFIG_CYRIX_SOFTWARE_TOGGLE + setup_cache(SCpnt, done); + #else + # define done_wrapper done + #endif + # if 1 + SCpnt->scsi_done = done_wrapper; + # else SCpnt->scsi_done = done; + # endif mb[mbo].status = 1; aha1542_out(SCpnt->host->io_port, &ahacmd, 1); /* start scsi command */ DEB(aha1542_stat()); *************** *** 899,904 **** --- 916,926 ---- for(mbo = 0; mbo < AHA1542_MAILBOXES; mbo++) if (SCpnt == HOSTDATA(SCpnt->host)->SCint[mbo]){ mb[mbo].status = 2; /* Abort command */ + # ifdef CONFIG_CYRIX_SOFTWARE_TOGGLE + /* Don't update dma_cmds_outstanding since this abort command replaces + the command we're aborting (I think?). */ + cyrix_cache_disable(); + # endif aha1542_out(&ahacmd, 1); /* start scsi command */ sti(); break; *************** *** 945,947 **** --- 967,1058 ---- #endif return 0; } + + #ifdef CONFIG_CYRIX_SOFTWARE_TOGGLE + static int dma_cmds_outstanding = 0; + static int cache_permanently_killed = 0; + + static void setup_cache(Scsi_Cmnd *SCpnt, void (*done)(Scsi_Cmnd *)) { + + SCpnt->SCp.ptr = (char *)done; + + if (!cache_permanently_killed) { /* prevent dma_cmds_outstanding from + rolling over */ + + cli(); + if (dma_cmds_outstanding++ == 0) { /* This probably doesn't + need cli()--*/ + DEB(printk("setup: cache off ")); + cyrix_cache_disable(); /* But this does. */ + } else { + DEB(printk("setup: %d were still outstanding ", + dma_cmds_outstanding-1)); + } + sti(); + } + + } + + static void done_wrapper(Scsi_Cmnd *SCpnt) { + + if (! cache_permanently_killed) { + + cli(); + if (--dma_cmds_outstanding <= 0) { + if (dma_cmds_outstanding < 0) { + cache_permanently_killed++; + sti(); + printk("\ + aha1542.c: dma_cmds_outstanding = %d, disabled cache permanently.\n", + dma_cmds_outstanding); + } else { + DEB(printk("cache on ")); + cyrix_cache_enable(); + sti(); + } + } + } else { + DEB(printk("dead cache, %d times ", cache_permanently_killed)); + } + + if (SCpnt->SCp.ptr) { + ((void (*)(Scsi_Cmnd *))SCpnt->SCp.ptr)(SCpnt); + } else { + printk("aha1542.c: done_wrapper: NULL done\n"); + } + + #if 0 + SCpnt->SCp.ptr = NULL; /* May help catch errors. */ + #endif /* Actually, it results in us getting a NULL + done after a few calls. Why? */ + + } + + /* This function should be called with interrupts off. I don't include a + cli()/sti() in it because the calling function might not want interrupts + turned on immediately. */ + static void cyrix_cache_toggle(int flag) { + + /* Actually, just controls whether DMA/memory refresh will flush the + cache (using the BARB input to the CPU). If your motherboard doesn't + seem to assert BARB on DMA, then you could change this to turn off + the cache directly. */ + + outb(0xc0, 0x22); /* select CCR0 register for next outb */ + #ifdef CONFIG_CYRIX_DLC + # ifdef CONFIG_CYRIX_RISKY + # define CACHE_OFF 0x22 /* enable #BARB/NC1 (may not be 100% DMA safe) + */ + # else + # define CACHE_OFF 0x23 /* enable #BARB/NC0/NC1 for DLC */ + # endif + # define CACHE_ON 0x02 /* enable NC1 only (NC0 doesn't matter since by + definition we're not doing DMA if we turn + the cache on) */ + #else + # define CACHE_OFF 0x12 /* enable #FLUSH/NC1 for DRX */ + #endif + outb(flag? CACHE_ON : CACHE_OFF, 0x23); + + } + #endif /* defined(CONFIG_CYRIX_SOFTWARE_TOGGLE) */ --- config.in~ Fri Oct 21 21:12:55 1994 +++ config.in Sun Oct 23 17:55:04 1994 @@ -16,7 +16,8 @@ bool 'Enable cache on Cyrix Cx486 series CPU' CONFIG_CYRIX_CPU y bool 'Modify cache coherency for Cyrix Cx486DLC CPU (needed only for DLC/SLC)' CONFIG_CYRIX_DLC y if [ "$CONFIG_CYRIX_DLC" = "y" ]; then -bool 'Try the alternate DLC cache setting (not safe on all machines)' CONFIG_CYRIX_RISKY y + bool 'Try software cache control (only if you use the Adaptec 154x and no other DMA peripherals)' CONFIG_CYRIX_SOFTWARE_TOGGLE y + bool 'Try the alternate DLC cache setting (not safe on all machines)' CONFIG_CYRIX_RISKY y fi comment 'Program binary formats' -- Raj Manandhar (617) 495-8348 or -2038 raj@cfa.harvard.edu Harvard-Smithsonian Center for Astrophysics ...!harvard!cfa!raj 60 Garden Street, Cambridge, MA 02138 cfa::raj, raj@cfa.bitnet My URL is http://hea-www.harvard.edu/~raj/home.html