Full disclosure: I run Oracle on Windows. I know that makes me some kind of red-headed stepchild. I'm not at liberty to change platforms, so it is what it is. Although I further acknowledge that running Oracle on 32-bit Windows is a commonly recognized sign of masochism.
Server setup
The server in question has Windows Server 2003, Enterprise Edition, 32-bit. 2 quad-core processors. 16 GB RAM. VLM (Very Large Memory) is in use. The boot.ini has /PAE, /3GB, and /USERVA=2900. I'll discuss the boot.ini setup more in a later post on VLM.
Oracle setup
10.2.0.4
shared_pool_size=600M
java_pool_size=16M
large_pool_size=90M
streams_pool_size=8M
db_block_buffers=667347
db_block_size=16384
pat=200M
use_indirect_data_buffers = TRUE
In the registry, awe_window_memory is set to 1.2 GB.
Having VLM enabled requires that the sga components be set manually. The total SGA there is 1942 MB. That leaves under a gigabyte for everything else that needs to occur under oracle.exe - namely, the PGA required for 90 users. And, as it turned out, the PGA required for this particular set of 90 users was just too much.
Crashes. Oh, the crashes. 4030s. 4031s. The works. Restart after restart. We eventually told the site to just reboot first and then call second. We dropped the values on every parameter. We had the shared pool set so low I'm amazed the instance stayed up. We had PAT down around 50M. Not cool.
And then I noticed something 'strange.' Strange, in this case, means:
strange (adj.) - a quality that indicates someone smarter than yourself would have known about this in advance (or at least figured it out faster. Dummy.)
What was strange was that as we continued decreasing PAT, the total PGA allocation for the instance was not staying any lower than previously. We were buying some time between reboots, but the high water mark was roughly the same. Also, as a bonus, performance was crap.
I started doing some research and discovered the following critically important information:
The PAT -> PGA relationship involves a set of rules that was likely drafted by Satan or some equally devious entity. Oracle hints subtly at the nefarious nature of PAT by the name of the parameter - PGA_AGGREGATE_TARGET. The English translation is "the desired allocation of PGA memory across all sessions." This is not a maximum. It's not a minimum. It's a polite request for the database to consider your wishes, ala the following vignette:
//
DBA: "Please, please, nice Oracle, could you keep this wonky, misunderstood pool of memory around this tiny number without knowing anything in advance about the work that will be attempted? Because I haven't migrated to a 64-bit platform yet and I don't really have enough memory to run this many users in this database."
Oracle: "How dare you ask the mighty Oracle to cap PGA at this ludicrously low value?! I am angered by your incompetence! I shall allocate as I wish, and no mere DBA shall stop me lest he wield undocumented parameters!!!!"
//
And that, dear friends, is why v$pgastat has an 'over allocation count' statistic. That's how many times Oracle has pwn3d your feeble limit on PGA per session since the instance was restarted. Despite my
I determined that the answer was ditching VLM. Calculating backwards, based on the PGA high water mark of roughly 800 MB (note: NOT THE SAME as PAT) and before adding AWE_WINDOW_MEMORY we were at 1500 MB (with a properly sized shared pool.) Tack on the window size and that's 2700 MB. Not much wiggle room, is there? Truth is, you gotta count for PGA first if at all possible since Oracle will try to take it anyway. Then you'll have a clearer picture of where you should set sga_target and sga_max_size. More on how PAT affects PGA in 10gR2 in a later post.
Oh, and the site is now running 64-bit. Go figure.