queryperformancecounter

QueryPerformanceCounter Not Working Properly on Amazon EC2 with Win Server 2008

- by timeitquery

Our application makes use of QueryPerformanceCounter to measure timings. When running it on Amazon EC2 instances we are noticing number that are out of whack. The symptoms look very similar to what has been reported with virtualized machines on HP proreliant servers, however all the articles state that this was not the case for Windows Server 2008. The articles recommended modifing the boot.ini to /USEPMTIMER - however, this seem to be missing from the documentation on bcdedit - which replaced boot.ini

Read the article

C linux equivalent of windows QueryPerformanceCounter

- by myforwik

Is there an equivalent C function in linux for reading the CPU counter and its frequency? I am looking for something similair to QueryPerformanceCounter function that reads the 64bit counter in modern CPU's

Read the article

using LARGE_INTEGER gives me back error error C2679: '=' binary no operator found which takes a right-hand operand

- by rekotc

i have the following code: QueryPerformanceCounter(&timeStart); winMain::render(); //do stuff QueryPerformanceCounter(&timeEnd); numCounts = ( timeEnd.QuadPart - timeStart.QuadPart); All the 3 variables are declared as LARGE_INTEGER, the code should work since im following a book example, but i get: error C2679: '=' binary no operator found which takes a right-hand operand of type LONGLONG it might be '_LARGE_INTEGER &_LARGE_INTEGER::operator =(const _LARGE_INTEGER &)' 1 durante la ricerca di corrispondenza con l'elenco di argomenti '(LARGE_INTEGER, LONGLONG)'

Read the article

My rhythm game runs choppy even with high frame rate

- by felipedrl

I'm coding a rhythm game and the game runs smoothly with uncapped fps. But when I try to cap it around 60 the game updates in little chunks, like hiccups, as if it was skipping frames or at a very low frame rate. The reason I need to cap frame rate is because in some computers I tested, the fps varies a lot (from ~80 - ~250 fps) and those drops are noticeable and degrade response time. Since this is a rhythm game this is very important. This issue is driving me crazy. I've spent a few weeks already on it and still can't figure out the problem. I hope someone more experienced than me could shed some light on it. I'll try to put here all the hints I've tried along with two pseudo codes for game loops I tried, so I apologize if this post gets too lengthy. 1st GameLoop: const uint UPDATE_SKIP = 1000 / 60; uint nextGameTick = SDL_GetTicks(); while(isNotDone) { // only false when a QUIT event is generated! if (processEvents()) { if (SDL_GetTicks() > nextGameTick) { update(UPDATE_SKIP); render(); nextGameTick += UPDATE_SKIP; } } } 2nd Game Loop: const uint UPDATE_SKIP = 1000 / 60; while (isNotDone) { LARGE_INTEGER startTime; QueryPerformanceCounter(&startTime); // process events will return false in case of a QUIT event processed if (processEvents()) { update(frameTime); render(); } LARGE_INTEGER endTime; do { QueryPerformanceCounter(&endTime); frameTime = static_cast<uint>((endTime.QuadPart - startTime.QuadPart) * 1000.0 / frequency.QuadPart); } while (frameTime < UPDATE_SKIP); } [1] At first I thought it was a timer resolution problem. I was using SDL_GetTicks, but even when I switched to QueryPerformanceCounter, supposedly less granular, I saw no difference. [2] Then I thought it could be due to a rounding error in my position computation and since game updates are smaller in high FPS that would be less noticeable. Indeed there is an small error, but from my tests I realized that it is not enough to produce the position jumps I'm getting. Also, another intriguing factor is that if I enable vsync I'll get smooth updates @60fps regardless frame cap code. So why not rely on vsync? Because some computers can force a disable on gfx card config. [3] I started printing the maximum and minimum frame time measured in 1sec span, in the hope that every a few frames one would take a long time but still not enough to drop my fps computation. It turns out that, with frame cap code I always get frame times in the range of [16, 18]ms, and still, the game "does not moves like jagger". [4] My process' priority is set to HIGH (Windows doesn't allow me to set REALTIME for some reason). As far as I know there is only one thread running along with the game (a sound callback, which I really don't have access to it). I'm using AudiereLib. I then disabled Audiere by removing it from the project and still got the issue. Maybe there are some others threads running and one of them is taking too long to come back right in between when I measured frame times, I don't know. Is there a way to know which threads are attached to my process? [5] There are some dynamic data being created during game run. But It is a little bit hard to remove it to test. Maybe I'll have to try harder this one. Well, as I told you I really don't know what to try next. Anything, I mean, anything would be of great help. What bugs me more is why at 60fps & vsync enabled I get an smooth update and at 60fps & no vsync I don't. Is there a way to implement software vsync? I mean, query display sync info? Thanks in advance. I appreciate the ones that got this far and yet again I apologize for the long post. Best Regards from a fellow coder.

Read the article

How to calculate a operation's time in micro second precision

- by Sanjeet Daga

I want to calculate performance of a function in micro second precision on Windows platform. Now Windows itself has milisecond granuality, so how can I achieve this. I tried following sample, but not getting correct results. LARGE_INTEGER ticksPerSecond = {0}; LARGE_INTEGER tick_1 = {0}; LARGE_INTEGER tick_2 = {0}; double uSec = 1000000; // Get the frequency QueryPerformanceFrequency(&ticksPerSecond); //Calculate per uSec freq double uFreq = ticksPerSecond.QuadPart/uSec; // Get counter b4 start of op QueryPerformanceCounter(&tick_1); // The ope itself Sleep(10); // Get counter after opfinished QueryPerformanceCounter(&tick_2); // And now the op time in uSec double diff = (tick_2.QuadPart/uFreq) - (tick_1.QuadPart/uFreq);

Read the article

Game programming and quantity of timers

- by andresjb

I've made a simple 2D game engine using C# and DirectX and it's fully functional for the demo I made to test it. I have a Timer object that uses QueryPerformanceCounter and I don't know what's the better choice: use only one timer in the game loop to update everything in the game, or an independent timer in every object that needs one. My worry is that when I try to implement threads, what will happen with timers? What happens with the sync?

Read the article

Delphi: Why does IdHTTP.ConnectTimeout make requests slower?

- by K.Sandell

I discovered that when setting the ConnectTimeoout property for a TIdHTTP component, it makes the requests (GET and POST) become about 120ms slower? Why is this, and can I avoid/bypass this somehow? Env: D2010 with shipped Indy components, all updates installed for D2010. OS is WinXP (32bit) SP3 with most patches... My timing routine is: Procedure DoGet; Var Freq,T1,T2 : Int64; Cli : TIdHTTP; S : String; begin QueryPerformanceFrequency(Freq); Try QueryPerformanceCounter(T1); Cli := TIdHTTP.Create( NIL ); Cli.ConnectTimeout := 1000; // without this we get < 15ms!! S := Cli.Get('http://127.0.0.1/empty_page.php'); Finally FreeAndNil(Cli); QueryPerformanceCounter(T2); End; Memo1.Lines.Add('Time = '+FormatFloat('0.000',(T2-T1)/Freq) ); End; With the ConnectTimeout set in code I get avg. times of 130-140ms, without it's about 5-15ms ...

Read the article

Micro Second resolution timestamps on windows.

- by Nikhil

How to get micro second resolution timestamps on windows? I am loking for something better than QueryPerformanceCounter, QueryPerformanceFrequency (these can only give you an elapsed time since boot, and are not necessarily accurate if they are called on different threads - ie QueryPerformanceCounter may return different results on different CPUs. There are also some processors that adjust their frequency for power saving, which apparently isn't always reflected in their QueryPerformanceFrequency result.) There is this, http://msdn.microsoft.com/en-us/magazine/cc163996.aspx but it does not seem to be solid. This looks great but its not available for download any more. http://www.ibm.com/developerworks/library/i-seconds/ This is another resource. http://www.lochan.org/2005/keith-cl/useful/win32time.html But requires a number of steps, running a helper program plus some init stuff also, I am not sure if it works on multiple CPUs Also looked at the Wikipedia link on the subject which is interesting but not that useful. http://en.wikipedia.org/wiki/Time_Stamp_Counter If the answer is just do this with BSD or Linux, its a lot easier thats fine, but I would like to confirm this and get some explanation as to why this is so hard in windows and so easy in linux and bsd. Its the same damm hardware...

Read the article

Did I implement clock drift properly?

- by David Titarenco

I couldn't find any clock drift RNG code for Windows anywhere so I attempted to implement it myself. I haven't run the numbers through ent or DIEHARD yet, and I'm just wondering if this is even remotely correct... void QueryRDTSC(__int64* tick) { __asm { xor eax, eax cpuid rdtsc mov edi, dword ptr tick mov dword ptr [edi], eax mov dword ptr [edi+4], edx } } __int64 clockDriftRNG() { __int64 CPU_start, CPU_end, OS_start, OS_end; // get CPU ticks -- uses RDTSC on the Processor QueryRDTSC(&CPU_start); Sleep(1); QueryRDTSC(&CPU_end); // get OS ticks -- uses the Motherboard clock QueryPerformanceCounter((LARGE_INTEGER*)&OS_start); Sleep(1); QueryPerformanceCounter((LARGE_INTEGER*)&OS_end); // CPU clock is ~1000x faster than mobo clock // return raw return ((CPU_end - CPU_start)/(OS_end - OS_start)); // or // return a random number from 0 to 9 // return ((CPU_end - CPU_start)/(OS_end - OS_start)%10); } If you're wondering why I Sleep(1), it's because if I don't, OS_end - OS_start returns 0 consistently (because of the bad timer resolution, I presume). Basically, (CPU_end - CPU_start)/(OS_end - OS_start) always returns around 1000 with a slight variation based on the entropy of CPU load, maybe temperature, quartz crystal vibration imperfections, etc. Anyway, the numbers have a pretty decent distribution, but this could be totally wrong. I have no idea.

Read the article

HPET for x86 BSP (how to build it for WCE8)

- by Werner Willemsens

Originally posted on: http://geekswithblogs.net/WernerWillemsens/archive/2014/08/02/157895.aspx"I needed a timer". That is how we started a few blogs ago our series about APIC and ACPI. Well, here it is. HPET (High Precision Event Timer) was introduced by Intel in early 2000 to: Replace old style Intel 8253 (1981!) and 8254 timers Support more accurate timers that could be used for multimedia purposes. Hence Microsoft and Intel sometimes refers to HPET as Multimedia timers. An HPET chip consists of a 64-bit up-counter (main counter) counting at a frequency of at least 10 MHz, and a set of (at least three, up to 256) comparators. These comparators are 32- or 64-bit wide. The HPET is discoverable via ACPI. The HPET circuit in recent Intel platforms is integrated into the SouthBridge chip (e.g. 82801) All HPET timers should support one-shot interrupt programming, while optionally they can support periodic interrupts. In most Intel SouthBridges I worked with, there are three HPET timers. TIMER0 supports both one-shot and periodic mode, while TIMER1 and TIMER2 are one-shot only. Each HPET timer can generate interrupts, both in old-style PIC mode and in APIC mode. However in PIC mode, interrupts cannot freely be chosen. Typically IRQ11 is available and cannot be shared with any other interrupt! Which makes the HPET in PIC mode virtually unusable. In APIC mode however more IRQs are available and can be shared with other interrupt generating devices. (Check the datasheet of your SouthBridge) Because of this higher level of freedom, I created the APIC BSP (see previous posts). The HPET driver code that I present you here uses this APIC mode. Hpet.reg [HKEY_LOCAL_MACHINE\Drivers\BuiltIn\Hpet] "Dll"="Hpet.dll" "Prefix"="HPT" "Order"=dword:10 "IsrDll"="giisr.dll" "IsrHandler"="ISRHandler" "Priority256"=dword:50 Because HPET does not reside on the PCI bus, but can be found through ACPI as a memory mapped device, you don't need to specify the "Class", "SubClass", "ProgIF" and other PCI related registry keys that you typically find for PCI devices. If a driver needs to run its internal thread(s) at a certain priority level, by convention in Windows CE you add the "Priority256" registry key. Through this key you can easily play with the driver's thread priority for better response and timer accuracy. See later. Hpet.cpp (Hpet.dll) This cpp file contains the complete HPET driver code. The file is part of a folder that you typically integrate in your BSP (\src\drivers\Hpet). It is written as sample (example) code, you most likely want to change this code to your specific needs. There are two sets of #define's that I use to control how the driver works. _TRIGGER_EVENT or _TRIGGER_SEMAPHORE: _TRIGGER_EVENT will let your driver trigger a Windows CE Event when the timer expires, _TRIGGER_SEMAPHORE will trigger a Windows CE counting Semaphore. The latter guarantees that no events get lost in case your application cannot always process the triggers fast enough. _TIMER0 or _TIMER2: both timers will trigger an event or semaphore periodically. _TIMER0 will use a periodic HPET timer interrupt, while _TIMER2 will reprogram a one-shot HPET timer after each interrupt. The one-shot approach is interesting if the frequency you wish to generate is not an even multiple of the HPET main counter frequency. The sample code uses an algorithm to generate a more correct frequency over a longer period (by reducing rounding errors). _TIMER1 is not used in the sample source code. HPT_Init() will locate the HPET I/O memory space, setup the HPET counter (_TIMER0 or _TIMER2) and install the Interrupt Service Thread (IST). Upon timer expiration, the IST will run and on its turn will generate a Windows CE Event or Semaphore. In case of _TIMER2 a new one-shot comparator value is calculated and set for the timer. The IRQ of the HPET timers are programmed to IRQ22, but you can choose typically from 20-23. The TIMERn_INT_ROUT_CAP bits in the TIMn_CONF register will tell you what IRQs you can choose from. HPT_IOControl() can be used to set a new HPET counter frequency (actually you configure the counter timeout value in microseconds), start and stop the timer, and request the current HPET counter value. The latter is interesting because the Windows CE QueryPerformanceCounter() and QueryPerformanceFrequency() APIs implement the same functionality, albeit based on other counter implementations. HpetDrvIst() contains the IST code. DWORD WINAPI HpetDrvIst(LPVOID lpArg) { psHpetDeviceContext pHwContext = (psHpetDeviceContext)lpArg; DWORD mainCount = READDWORD(pHwContext->g_hpet_va, GenCapIDReg + 4); // Main Counter Tick period (fempto sec 10E-15) DWORD i = 0; while (1) { WaitForSingleObject(pHwContext->g_isrEvent, INFINITE); #if defined(_TRIGGER_SEMAPHORE) LONG p = 0; BOOL b = ReleaseSemaphore(pHwContext->g_triggerEvent, 1, &p); #elif defined(_TRIGGER_EVENT) BOOL b = SetEvent(pHwContext->g_triggerEvent); #else #pragma error("Unknown TRIGGER") #endif #if defined(_TIMER0) DWORD currentCount = READDWORD(pHwContext->g_hpet_va, MainCounterReg); DWORD comparator = READDWORD(pHwContext->g_hpet_va, Tim0_ComparatorReg + 0); SETBIT(pHwContext->g_hpet_va, GenIntStaReg, 0); // clear interrupt on HPET level InterruptDone(pHwContext->g_sysIntr); // clear interrupt on OS level _LOGMSG(ZONE_INTERRUPT, (L"%s: HpetDrvIst 0 %06d %08X %08X", pHwContext->g_id, i++, currentCount, comparator)); #elif defined(_TIMER2) DWORD currentCount = READDWORD(pHwContext->g_hpet_va, MainCounterReg); DWORD previousComparator = READDWORD(pHwContext->g_hpet_va, Tim2_ComparatorReg + 0); pHwContext->g_counter2.QuadPart += pHwContext->g_comparator.QuadPart; // increment virtual counter (higher accuracy) DWORD comparator = (DWORD)(pHwContext->g_counter2.QuadPart >> 8); // "round" to real value WRITEDWORD(pHwContext->g_hpet_va, Tim2_ComparatorReg + 0, comparator); SETBIT(pHwContext->g_hpet_va, GenIntStaReg, 2); // clear interrupt on HPET level InterruptDone(pHwContext->g_sysIntr); // clear interrupt on OS level _LOGMSG(ZONE_INTERRUPT, (L"%s: HpetDrvIst 2 %06d %08X %08X (%08X)", pHwContext->g_id, i++, currentCount, comparator, comparator - previousComparator)); #else #pragma error("Unknown TIMER") #endif } return 1; } The following figure shows how the HPET hardware interrupt via ISR -> IST is translated in a Windows CE Event or Semaphore by the HPET driver. The Event or Semaphore can be used to trigger a Windows CE application. HpetTest.cpp (HpetTest.exe)This cpp file contains sample source how to use the HPET driver from an application. The file is part of a separate (smart device) VS2013 solution. It contains code to measure the generated Event/Semaphore times by means of GetSystemTime() and QueryPerformanceCounter() and QueryPerformanceFrequency() APIs. HPET evaluation If you scan the internet about HPET, you'll find many remarks about buggy HPET implementations and bad performance. Unfortunately that is true. I tested the HPET driver on an Intel ICH7M SBC (release date 2008). When a HPET timer expires on the ICH7M, an interrupt indeed is generated, but right after you clear the interrupt, a few more unwanted interrupts (too soon!) occur as well. I tested and debugged it for a loooong time, but I couldn't get it to work. I concluded ICH7M's HPET is buggy Intel hardware. I tested the HPET driver successfully on a more recent NM10 SBC (release date 2013). With the NM10 chipset however, I am not fully convinced about the timer's frequency accuracy. In the long run - on average - all is fine, but occasionally I experienced upto 20 microseconds delays (which were immediately compensated on the next interrupt). Of course, this was all measured by software, but I still experienced the occasional delay when both the HPET driver IST thread as the application thread ran at CeSetThreadPriority(1). If it is not the hardware, only the kernel can cause this delay. But Windows CE is an RTOS and I have never experienced such long delays with previous versions of Windows CE. I tested and developed this on WCE8, I am not heavily experienced with it yet. Internet forum threads however mention inaccurate HPET timer implementations as well. At this moment I haven't figured out what is going on here. Useful references: http://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/software-developers-hpet-spec-1-0a.pdf http://en.wikipedia.org/wiki/High_Precision_Event_Timer http://wiki.osdev.org/HPET Windows CE BSP source file package for HPET in MyBsp Note that this source code is "As Is". It is still under development and I cannot (and never will) guarantee the correctness of the code. Use it as a guide for your own HPET integration.

Read the article

Log with timestamps that have millisecond accuracy & resolution in Windows C++

- by Psychic

I'm aware that for timing accuracy, functions like timeGetTime, timeBeginPeriod, QueryPerformanceCounter etc are great, giving both good resolution & accuracy, but only based on time-since-boot, with no direct link to clock time. However, I don't want to time events as such. I want to be able to produce an exact timestamp (local time) so that I can display it in a log file, eg 31-12-2010 12:38:35.345, for each entry made. (I need the millisecond accuracy) The standard Windows time functions, like GetLocalTime, whilst they give millisecond values, don't have millisecond resolution, depending on the OS running. I'm using XP, so I can't expect much better than about a 15ms resolution. What I need is a way to get the best of both worlds, without creating a large overhead to get the required output. Overly large methods/calculations would mean that the logger would start to eat up too much time during its operation. What would be the best/simplest way to do this?

Read the article

Compiling a Windows C++ program in g++

- by Phenom

I'm trying to compile a Windows C++ program in g++. This is what I get. /usr/include/c++/4.4/backward/backward_warning.h:28:2: warning: #warning This file includes at least one deprecated or antiquated header which may be removed without further notice at a future date. Please use a non-deprecated interface with equivalent functionality instead. For a listing of replacement headers and interfaces, consult the file backward_warning.h. To disable this warning use -Wno-deprecated. btree.cpp:1204: error: ‘_TCHAR’ has not been declared btree.cpp: In function ‘int _tmain(int, int**)’: btree.cpp:1218: error: ‘__int64’ was not declared in this scope btree.cpp:1218: error: expected ‘;’ before ‘frequency’ btree.cpp:1220: error: ‘LARGE_INTEGER’ was not declared in this scope btree.cpp:1220: error: expected primary-expression before ‘)’ token btree.cpp:1220: error: ‘frequency’ was not declared in this scope btree.cpp:1220: error: ‘QueryPerformanceFrequency’ was not declared in this scope btree.cpp:1262: error: expected primary-expression before ‘)’ token btree.cpp:1262: error: ‘start’ was not declared in this scope btree.cpp:1262: error: ‘QueryPerformanceCounter’ was not declared in this scope btree.cpp:1264: error: name lookup of ‘i’ changed for ISO ‘for’ scoping btree.cpp:1264: note: (if you use ‘-fpermissive’ G++ will accept your code) btree.cpp:1304: error: expected primary-expression before ‘)’ token btree.cpp:1304: error: ‘end’ was not declared in this scope btree.cpp:1306: error: ‘total’ was not declared in this scope btree.cpp:1316: error: ‘getchar’ was not declared in this scope The first thing I noticed is that there are these variable types called _TCHAR, _int64, and LARGE_INTEGER, which is probably a Windows thing. What can these be changed to so that they will work in g++? Also, if there's anything else in here that you know can be converted to g++, that would be helpful. I got the code from here: http://touc.org/btree.html

Read the article

How to get timestamp of tick precision in .NET / C#?

- by Hermann

Up until now I used DateTime.Now for getting timestamps, but I noticed that if you print DateTime.Now in a loop you will see that it increments in descrete jumps of approx. 15 ms. But for certain scenarios in my application I need to get the most accurate timestamp possible, preferably with tick (=100 ns) precision. Any ideas? Update: Apparently, StopWatch / QueryPerformanceCounter is the way to go, but it can only be used to measure time, so I was thinking about calling DateTime.Now when the application starts up and then just have StopWatch run and then just add the elapsed time from StopWatch to the initial value returned from DateTime.Now. At least that should give me accurate relative timestamps, right? What do you think about that (hack)? NOTE: StopWatch.ElapsedTicks is different from StopWatch.Elapsed.Ticks! I used the former assuming 1 tick = 100 ns, but in this case 1 tick = 1 / StopWatch.Frequency. So to get ticks equivalent to DateTime use StopWatch.Elapsed.Ticks. I just learned this the hard way. NOTE 2: Using the StopWatch approach, I noticed it gets out of sync with the real time. After about 10 hours, it was ahead by 5 seconds. So I guess one would have to resync it every X or so where X could be 1 hour, 30 min, 15 min, etc. I am not sure what the optimal timespan for resyncing would be since every resync will change the offset which can be up to 20 ms.

Read the article

Issues with HLSL and lighting

- by numerical25

I am trying figure out whats going on with my HLSL code but I have no way of debugging it cause C++ gives off no errors. The application just closes when I run it. I am trying to add lighting to a 3d plane I made. below is my HLSL. The problem consist when my Pixel shader method returns the struct "outColor" . If I change the return value back to the struct "psInput" , everything goes back to working again. My light vectors and colors are at the top of the fx file // PS_INPUT - input variables to the pixel shader // This struct is created and fill in by the // vertex shader cbuffer Variables { matrix Projection; matrix World; float TimeStep; }; struct PS_INPUT { float4 Pos : SV_POSITION; float4 Color : COLOR0; float3 Normal : TEXCOORD0; float3 ViewVector : TEXCOORD1; }; float specpower = 80.0f; float3 camPos = float3(0.0f, 9.0, -256.0f); float3 DirectLightColor = float3(1.0f, 1.0f, 1.0f); float3 DirectLightVector = float3(0.0f, 0.602f, 0.70f); float3 AmbientLightColor = float3(1.0f, 1.0f, 1.0f); /*************************************** * Lighting functions ***************************************/ /********************************* * CalculateAmbient - * inputs - * vKa material's reflective color * lightColor - the ambient color of the lightsource * output - ambient color *********************************/ float3 CalculateAmbient(float3 vKa, float3 lightColor) { float3 vAmbient = vKa * lightColor; return vAmbient; } /********************************* * CalculateDiffuse - * inputs - * material color * The color of the direct light * the local normal * the vector of the direct light * output - difuse color *********************************/ float3 CalculateDiffuse(float3 baseColor, float3 lightColor, float3 normal, float3 lightVector) { float3 vDiffuse = baseColor * lightColor * saturate(dot(normal, lightVector)); return vDiffuse; } /********************************* * CalculateSpecular - * inputs - * viewVector * the direct light vector * the normal * output - specular highlight *********************************/ float CalculateSpecular(float3 viewVector, float3 lightVector, float3 normal) { float3 vReflect = reflect(lightVector, normal); float fSpecular = saturate(dot(vReflect, viewVector)); fSpecular = pow(fSpecular, specpower); return fSpecular; } /********************************* * LightingCombine - * inputs - * ambient component * diffuse component * specualr component * output - phong color color *********************************/ float3 LightingCombine(float3 vAmbient, float3 vDiffuse, float fSpecular) { float3 vCombined = vAmbient + vDiffuse + fSpecular.xxx; return vCombined; } //////////////////////////////////////////////// // Vertex Shader - Main Function /////////////////////////////////////////////// PS_INPUT VS(float4 Pos : POSITION, float4 Color : COLOR, float3 Normal : NORMAL) { PS_INPUT psInput; float4 newPosition; newPosition = Pos; newPosition.y = sin((newPosition.x * TimeStep) + (newPosition.z / 3.0f)) * 5.0f; // Pass through both the position and the color psInput.Pos = mul(newPosition , Projection ); psInput.Color = Color; psInput.ViewVector = normalize(camPos - psInput.Pos); return psInput; } /////////////////////////////////////////////// // Pixel Shader /////////////////////////////////////////////// //Anthony!!!!!!!!!!! Find out how color works when multiplying them float4 PS(PS_INPUT psInput) : SV_Target { float3 normal = -normalize(psInput.Normal); float3 vAmbient = CalculateAmbient(psInput.Color, AmbientLightColor); float3 vDiffuse = CalculateDiffuse(psInput.Color, DirectLightColor, normal, DirectLightVector); float fSpecular = CalculateSpecular(psInput.ViewVector, DirectLightVector, normal); float4 outColor; outColor.rgb = LightingCombine(vAmbient, vDiffuse, fSpecular); outColor.a = 1.0f; //Below is where the error begins return outColor; } // Define the technique technique10 Render { pass P0 { SetVertexShader( CompileShader( vs_4_0, VS() ) ); SetGeometryShader( NULL ); SetPixelShader( CompileShader( ps_4_0, PS() ) ); } } Below is some of my c++ code. Reason I am showing this is because it is pretty much what creates the surface normals for my shaders to evaluate. for the lighting for(int z=0; z < NUM_ROWS; ++z) { for(int x = 0; x < NUM_COLS; ++x) { int curVertex = x + (z * NUM_VERTSX); indices[curIndex] = curVertex; indices[curIndex + 1] = curVertex + NUM_VERTSX; indices[curIndex + 2] = curVertex + 1; D3DXVECTOR3 v0 = vertices[indices[curIndex]].pos; D3DXVECTOR3 v1 = vertices[indices[curIndex + 1]].pos; D3DXVECTOR3 v2 = vertices[indices[curIndex + 2]].pos; D3DXVECTOR3 normal; D3DXVECTOR3 cross; D3DXVec3Cross(&cross, &D3DXVECTOR3(v2 - v0),&D3DXVECTOR3(v1 - v0)); D3DXVec3Normalize(&normal, &cross); vertices[indices[curIndex]].normal = normal; vertices[indices[curIndex + 1]].normal = normal; vertices[indices[curIndex + 2]].normal = normal; indices[curIndex + 3] = curVertex + 1; indices[curIndex + 4] = curVertex + NUM_VERTSX; indices[curIndex + 5] = curVertex + NUM_VERTSX + 1; v0 = vertices[indices[curIndex + 3]].pos; v1 = vertices[indices[curIndex + 4]].pos; v2 = vertices[indices[curIndex + 5]].pos; D3DXVec3Cross(&cross, &D3DXVECTOR3(v2 - v0),&D3DXVECTOR3(v1 - v0)); D3DXVec3Normalize(&normal, &cross); vertices[indices[curIndex + 3]].normal = normal; vertices[indices[curIndex + 4]].normal = normal; vertices[indices[curIndex + 5]].normal = normal; curIndex += 6; } } and below is my c++ code, in it's entirety. showing the drawing and also calling on the passes #include "MyGame.h" //#include "CubeVector.h" /* This code sets a projection and shows a turning cube. What has been added is the project, rotation and a rasterizer to change the rasterization of the cube. The issue that was going on was something with the effect file which was causing the vertices not to be rendered correctly.*/ typedef struct { ID3D10Effect* pEffect; ID3D10EffectTechnique* pTechnique; //vertex information ID3D10Buffer* pVertexBuffer; ID3D10Buffer* pIndicesBuffer; ID3D10InputLayout* pVertexLayout; UINT numVertices; UINT numIndices; }ModelObject; ModelObject modelObject; // World Matrix D3DXMATRIX WorldMatrix; // View Matrix D3DXMATRIX ViewMatrix; // Projection Matrix D3DXMATRIX ProjectionMatrix; ID3D10EffectMatrixVariable* pProjectionMatrixVariable = NULL; //grid information #define NUM_COLS 16 #define NUM_ROWS 16 #define CELL_WIDTH 32 #define CELL_HEIGHT 32 #define NUM_VERTSX (NUM_COLS + 1) #define NUM_VERTSY (NUM_ROWS + 1) // timer variables LARGE_INTEGER timeStart; LARGE_INTEGER timeEnd; LARGE_INTEGER timerFreq; double currentTime; float anim_rate; // Variable to hold how long since last frame change float lastElaspedFrame = 0; // How long should the frames last float frameDuration = 0.5; bool MyGame::InitDirect3D() { if(!DX3dApp::InitDirect3D()) { return false; } // Get the timer frequency QueryPerformanceFrequency(&timerFreq); float freqSeconds = 1.0f / timerFreq.QuadPart; lastElaspedFrame = 0; D3D10_RASTERIZER_DESC rastDesc; rastDesc.FillMode = D3D10_FILL_WIREFRAME; rastDesc.CullMode = D3D10_CULL_FRONT; rastDesc.FrontCounterClockwise = true; rastDesc.DepthBias = false; rastDesc.DepthBiasClamp = 0; rastDesc.SlopeScaledDepthBias = 0; rastDesc.DepthClipEnable = false; rastDesc.ScissorEnable = false; rastDesc.MultisampleEnable = false; rastDesc.AntialiasedLineEnable = false; ID3D10RasterizerState *g_pRasterizerState; mpD3DDevice->CreateRasterizerState(&rastDesc, &g_pRasterizerState); mpD3DDevice->RSSetState(g_pRasterizerState); // Set up the World Matrix D3DXMatrixIdentity(&WorldMatrix); D3DXMatrixLookAtLH(&ViewMatrix, new D3DXVECTOR3(200.0f, 60.0f, -20.0f), new D3DXVECTOR3(200.0f, 50.0f, 0.0f), new D3DXVECTOR3(0.0f, 1.0f, 0.0f)); // Set up the projection matrix D3DXMatrixPerspectiveFovLH(&ProjectionMatrix, (float)D3DX_PI * 0.5f, (float)mWidth/(float)mHeight, 0.1f, 100.0f); pTimeVariable = NULL; if(!CreateObject()) { return false; } return true; } //These are actions that take place after the clearing of the buffer and before the present void MyGame::GameDraw() { static float rotationAngle = 0.0f; // create the rotation matrix using the rotation angle D3DXMatrixRotationY(&WorldMatrix, rotationAngle); rotationAngle += (float)D3DX_PI * 0.0f; // Set the input layout mpD3DDevice->IASetInputLayout(modelObject.pVertexLayout); // Set vertex buffer UINT stride = sizeof(VertexPos); UINT offset = 0; mpD3DDevice->IASetVertexBuffers(0, 1, &modelObject.pVertexBuffer, &stride, &offset); mpD3DDevice->IASetIndexBuffer(modelObject.pIndicesBuffer, DXGI_FORMAT_R32_UINT, 0); pTimeVariable->SetFloat((float)currentTime); // Set primitive topology mpD3DDevice->IASetPrimitiveTopology(D3D10_PRIMITIVE_TOPOLOGY_TRIANGLELIST); // Combine and send the final matrix to the shader D3DXMATRIX finalMatrix = (WorldMatrix * ViewMatrix * ProjectionMatrix); pProjectionMatrixVariable->SetMatrix((float*)&finalMatrix); // make sure modelObject is valid // Render a model object D3D10_TECHNIQUE_DESC techniqueDescription; modelObject.pTechnique->GetDesc(&techniqueDescription); // Loop through the technique passes for(UINT p=0; p < techniqueDescription.Passes; ++p) { modelObject.pTechnique->GetPassByIndex(p)->Apply(0); // draw the cube using all 36 vertices and 12 triangles mpD3DDevice->DrawIndexed(modelObject.numIndices,0,0); } } //Render actually incapsulates Gamedraw, so you can call data before you actually clear the buffer or after you //present data void MyGame::Render() { // Get the start timer count QueryPerformanceCounter(&timeStart); currentTime += anim_rate; DX3dApp::Render(); QueryPerformanceCounter(&timeEnd); anim_rate = ( (float)timeEnd.QuadPart - (float)timeStart.QuadPart ) / timerFreq.QuadPart; } bool MyGame::CreateObject() { VertexPos vertices[NUM_VERTSX * NUM_VERTSY]; for(int z=0; z < NUM_VERTSY; ++z) { for(int x = 0; x < NUM_VERTSX; ++x) { vertices[x + z * NUM_VERTSX].pos.x = (float)x * CELL_WIDTH; vertices[x + z * NUM_VERTSX].pos.z = (float)z * CELL_HEIGHT; vertices[x + z * NUM_VERTSX].pos.y = (float)(rand() % CELL_HEIGHT); vertices[x + z * NUM_VERTSX].color = D3DXVECTOR4(1.0, 0.0f, 0.0f, 0.0f); } } DWORD indices[NUM_VERTSX * NUM_VERTSY * 6]; int curIndex = 0; for(int z=0; z < NUM_ROWS; ++z) { for(int x = 0; x < NUM_COLS; ++x) { int curVertex = x + (z * NUM_VERTSX); indices[curIndex] = curVertex; indices[curIndex + 1] = curVertex + NUM_VERTSX; indices[curIndex + 2] = curVertex + 1; D3DXVECTOR3 v0 = vertices[indices[curIndex]].pos; D3DXVECTOR3 v1 = vertices[indices[curIndex + 1]].pos; D3DXVECTOR3 v2 = vertices[indices[curIndex + 2]].pos; D3DXVECTOR3 normal; D3DXVECTOR3 cross; D3DXVec3Cross(&cross, &D3DXVECTOR3(v2 - v0),&D3DXVECTOR3(v1 - v0)); D3DXVec3Normalize(&normal, &cross); vertices[indices[curIndex]].normal = normal; vertices[indices[curIndex + 1]].normal = normal; vertices[indices[curIndex + 2]].normal = normal; indices[curIndex + 3] = curVertex + 1; indices[curIndex + 4] = curVertex + NUM_VERTSX; indices[curIndex + 5] = curVertex + NUM_VERTSX + 1; v0 = vertices[indices[curIndex + 3]].pos; v1 = vertices[indices[curIndex + 4]].pos; v2 = vertices[indices[curIndex + 5]].pos; D3DXVec3Cross(&cross, &D3DXVECTOR3(v2 - v0),&D3DXVECTOR3(v1 - v0)); D3DXVec3Normalize(&normal, &cross); vertices[indices[curIndex + 3]].normal = normal; vertices[indices[curIndex + 4]].normal = normal; vertices[indices[curIndex + 5]].normal = normal; curIndex += 6; } } //Create Layout D3D10_INPUT_ELEMENT_DESC layout[] = { {"POSITION",0,DXGI_FORMAT_R32G32B32_FLOAT, 0 , 0, D3D10_INPUT_PER_VERTEX_DATA, 0}, {"COLOR",0,DXGI_FORMAT_R32G32B32A32_FLOAT, 0 , 12, D3D10_INPUT_PER_VERTEX_DATA, 0}, {"NORMAL",0,DXGI_FORMAT_R32G32B32A32_FLOAT, 0 , 28, D3D10_INPUT_PER_VERTEX_DATA, 0} }; UINT numElements = (sizeof(layout)/sizeof(layout[0])); modelObject.numVertices = sizeof(vertices)/sizeof(VertexPos); //Create buffer desc D3D10_BUFFER_DESC bufferDesc; bufferDesc.Usage = D3D10_USAGE_DEFAULT; bufferDesc.ByteWidth = sizeof(VertexPos) * modelObject.numVertices; bufferDesc.BindFlags = D3D10_BIND_VERTEX_BUFFER; bufferDesc.CPUAccessFlags = 0; bufferDesc.MiscFlags = 0; D3D10_SUBRESOURCE_DATA initData; initData.pSysMem = vertices; //Create the buffer HRESULT hr = mpD3DDevice->CreateBuffer(&bufferDesc, &initData, &modelObject.pVertexBuffer); if(FAILED(hr)) return false; modelObject.numIndices = sizeof(indices)/sizeof(DWORD); bufferDesc.ByteWidth = sizeof(DWORD) * modelObject.numIndices; bufferDesc.BindFlags = D3D10_BIND_INDEX_BUFFER; initData.pSysMem = indices; hr = mpD3DDevice->CreateBuffer(&bufferDesc, &initData, &modelObject.pIndicesBuffer); if(FAILED(hr)) return false; ///////////////////////////////////////////////////////////////////////////// //Set up fx files LPCWSTR effectFilename = L"effect.fx"; modelObject.pEffect = NULL; hr = D3DX10CreateEffectFromFile(effectFilename, NULL, NULL, "fx_4_0", D3D10_SHADER_ENABLE_STRICTNESS, 0, mpD3DDevice, NULL, NULL, &modelObject.pEffect, NULL, NULL); if(FAILED(hr)) return false; pProjectionMatrixVariable = modelObject.pEffect->GetVariableByName("Projection")->AsMatrix(); pTimeVariable = modelObject.pEffect->GetVariableByName("TimeStep")->AsScalar(); //Dont sweat the technique. Get it! LPCSTR effectTechniqueName = "Render"; modelObject.pTechnique = modelObject.pEffect->GetTechniqueByName(effectTechniqueName); if(modelObject.pTechnique == NULL) return false; //Create Vertex layout D3D10_PASS_DESC passDesc; modelObject.pTechnique->GetPassByIndex(0)->GetDesc(&passDesc); hr = mpD3DDevice->CreateInputLayout(layout, numElements, passDesc.pIAInputSignature, passDesc.IAInputSignatureSize, &modelObject.pVertexLayout); if(FAILED(hr)) return false; return true; }

Read the article

Very different I/O performance in C++ on Windows

- by Mr.Gate

Hi all, I'm a new user and my english is not so good so I hope to be clear. We're facing a performance problem using large files (1GB or more) expecially (as it seems) when you try to grow them in size. Anyway... to verify our sensations we tryed the following (on Win 7 64Bit, 4core, 8GB Ram, 32 bit code compiled with VC2008) a) Open an unexisting file. Write it from the beginning up to 1Gb in 1Mb slots. Now you have a 1Gb file. Now randomize 10000 positions within that file, seek to that position and write 50 bytes in each position, no matter what you write. Close the file and look at the results. Time to create the file is quite fast (about 0.3"), time to write 10000 times is fast all the same (about 0.03"). Very good, this is the beginnig. Now try something else... b) Open an unexisting file, seek to 1Gb-1byte and write just 1 byte. Now you have another 1Gb file. Follow the next steps exactly same way of case 'a', close the file and look at the results. Time to create the file is the faster you can imagine (about 0.00009") but write time is something you can't believe.... about 90"!!!!! b.1) Open an unexisting file, don't write any byte. Act as before, ramdomizing, seeking and writing, close the file and look at the result. Time to write is long all the same: about 90"!!!!! Ok... this is quite amazing. But there's more! c) Open again the file you crated in case 'a', don't truncate it... randomize again 10000 positions and act as before. You're fast as before, about 0,03" to write 10000 times. This sounds Ok... try another step. d) Now open the file you created in case 'b', don't truncate it... randomize again 10000 positions and act as before. You're slow again and again, but the time is reduced to... 45"!! Maybe, trying again, the time will reduce. I actually wonder why... Any Idea? The following is part of the code I used to test what I told in previuos cases (you'll have to change someting in order to have a clean compilation, I just cut & paste from some source code, sorry). The sample can read and write, in random, ordered or reverse ordered mode, but write only in random order is the clearest test. We tryed using std::fstream but also using directly CreateFile(), WriteFile() and so on the results are the same (even if std::fstream is actually a little slower). Parameters for case 'a' = -f_tempdir_\casea.dat -n10000 -t -p -w Parameters for case 'b' = -f_tempdir_\caseb.dat -n10000 -t -v -w Parameters for case 'b.1' = -f_tempdir_\caseb.dat -n10000 -t -w Parameters for case 'c' = -f_tempdir_\casea.dat -n10000 -w Parameters for case 'd' = -f_tempdir_\caseb.dat -n10000 -w Run the test (and even others) and see... // iotest.cpp : Defines the entry point for the console application. // #include <windows.h> #include <iostream> #include <set> #include <vector> #include "stdafx.h" double RealTime_Microsecs() { LARGE_INTEGER fr = {0, 0}; LARGE_INTEGER ti = {0, 0}; double time = 0.0; QueryPerformanceCounter(&ti); QueryPerformanceFrequency(&fr); time = (double) ti.QuadPart / (double) fr.QuadPart; return time; } int main(int argc, char* argv[]) { std::string sFileName ; size_t stSize, stTimes, stBytes ; int retval = 0 ; char *p = NULL ; char *pPattern = NULL ; char *pReadBuf = NULL ; try { // Default stSize = 1<<30 ; // 1Gb stTimes = 1000 ; stBytes = 50 ; bool bTruncate = false ; bool bPre = false ; bool bPreFast = false ; bool bOrdered = false ; bool bReverse = false ; bool bWriteOnly = false ; // Comsumo i parametri for(int index=1; index < argc; ++index) { if ( '-' != argv[index][0] ) throw ; switch(argv[index][1]) { case 'f': sFileName = argv[index]+2 ; break ; case 's': stSize = xw::str::strtol(argv[index]+2) ; break ; case 'n': stTimes = xw::str::strtol(argv[index]+2) ; break ; case 'b':stBytes = xw::str::strtol(argv[index]+2) ; break ; case 't': bTruncate = true ; break ; case 'p' : bPre = true, bPreFast = false ; break ; case 'v' : bPreFast = true, bPre = false ; break ; case 'o' : bOrdered = true, bReverse = false ; break ; case 'r' : bReverse = true, bOrdered = false ; break ; case 'w' : bWriteOnly = true ; break ; default: throw ; break ; } } if ( sFileName.empty() ) { std::cout << "Usage: -f<File Name> -s<File Size> -n<Number of Reads and Writes> -b<Bytes per Read and Write> -t -p -v -o -r -w" << std::endl ; std::cout << "-t truncates the file, -p pre load the file, -v pre load 'veloce', -o writes in order mode, -r write in reverse order mode, -w Write Only" << std::endl ; std::cout << "Default: 1Gb, 1000 times, 50 bytes" << std::endl ; throw ; } if ( !stSize || !stTimes || !stBytes ) { std::cout << "Invalid Parameters" << std::endl ; return -1 ; } size_t stBestSize = 0x00100000 ; std::fstream fFile ; fFile.open(sFileName.c_str(), std::ios_base::binary|std::ios_base::out|std::ios_base::in|(bTruncate?std::ios_base::trunc:0)) ; p = new char[stBestSize] ; pPattern = new char[stBytes] ; pReadBuf = new char[stBytes] ; memset(p, 0, stBestSize) ; memset(pPattern, (int)(stBytes&0x000000ff), stBytes) ; double dTime = RealTime_Microsecs() ; size_t stCopySize, stSizeToCopy = stSize ; if ( bPre ) { do { stCopySize = std::min(stSizeToCopy, stBestSize) ; fFile.write(p, stCopySize) ; stSizeToCopy -= stCopySize ; } while (stSizeToCopy) ; std::cout << "Creating time is: " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; } else if ( bPreFast ) { fFile.seekp(stSize-1) ; fFile.write(p, 1) ; std::cout << "Creating Fast time is: " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; } size_t stPos ; ::srand((unsigned int)dTime) ; double dReadTime, dWriteTime ; stCopySize = stTimes ; std::vector<size_t> inVect ; std::vector<size_t> outVect ; std::set<size_t> outSet ; std::set<size_t> inSet ; // Prepare vector and set do { stPos = (size_t)(::rand()<<16) % stSize ; outVect.push_back(stPos) ; outSet.insert(stPos) ; stPos = (size_t)(::rand()<<16) % stSize ; inVect.push_back(stPos) ; inSet.insert(stPos) ; } while (--stCopySize) ; // Write & read using vectors if ( !bReverse && !bOrdered ) { std::vector<size_t>::iterator outI, inI ; outI = outVect.begin() ; inI = inVect.begin() ; stCopySize = stTimes ; dReadTime = 0.0 ; dWriteTime = 0.0 ; do { dTime = RealTime_Microsecs() ; fFile.seekp(*outI) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++outI ; if ( !bWriteOnly ) { dTime = RealTime_Microsecs() ; fFile.seekg(*inI) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++inI ; } } while (--stCopySize) ; std::cout << "Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " (Ave: " << xw::str::itoa(dWriteTime/stTimes, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { std::cout << "Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " (Ave: " << xw::str::itoa(dReadTime/stTimes, 10, 'f') << ")" << std::endl ; } } // End // Write in order if ( bOrdered ) { std::set<size_t>::iterator i = outSet.begin() ; dWriteTime = 0.0 ; stCopySize = 0 ; for(; i != outSet.end(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekp(stPos) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Ordered Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dWriteTime/stCopySize, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { i = inSet.begin() ; dReadTime = 0.0 ; stCopySize = 0 ; for(; i != inSet.end(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekg(stPos) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Ordered Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dReadTime/stCopySize, 10, 'f') << ")" << std::endl ; } }// End // Write in reverse order if ( bReverse ) { std::set<size_t>::reverse_iterator i = outSet.rbegin() ; dWriteTime = 0.0 ; stCopySize = 0 ; for(; i != outSet.rend(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekp(stPos) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Reverse ordered Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dWriteTime/stCopySize, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { i = inSet.rbegin() ; dReadTime = 0.0 ; stCopySize = 0 ; for(; i != inSet.rend(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekg(stPos) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Reverse ordered Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dReadTime/stCopySize, 10, 'f') << ")" << std::endl ; } }// End dTime = RealTime_Microsecs() ; fFile.close() ; std::cout << "Flush/Close Time is " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; std::cout << "Program Terminated" << std::endl ; } catch(...) { std::cout << "Something wrong or wrong parameters" << std::endl ; retval = -1 ; } if ( p ) delete []p ; if ( pPattern ) delete []pPattern ; if ( pReadBuf ) delete []pReadBuf ; return retval ; }

Read the article

Very different IO performance in C/C++

- by Roberto Tirabassi

Hi all, I'm a new user and my english is not so good so I hope to be clear. We're facing a performance problem using large files (1GB or more) expecially (as it seems) when you try to grow them in size. Anyway... to verify our sensations we tryed the following (on Win 7 64Bit, 4core, 8GB Ram, 32 bit code compiled with VC2008) a) Open an unexisting file. Write it from the beginning up to 1Gb in 1Mb slots. Now you have a 1Gb file. Now randomize 10000 positions within that file, seek to that position and write 50 bytes in each position, no matter what you write. Close the file and look at the results. Time to create the file is quite fast (about 0.3"), time to write 10000 times is fast all the same (about 0.03"). Very good, this is the beginnig. Now try something else... b) Open an unexisting file, seek to 1Gb-1byte and write just 1 byte. Now you have another 1Gb file. Follow the next steps exactly same way of case 'a', close the file and look at the results. Time to create the file is the faster you can imagine (about 0.00009") but write time is something you can't believe.... about 90"!!!!! b.1) Open an unexisting file, don't write any byte. Act as before, ramdomizing, seeking and writing, close the file and look at the result. Time to write is long all the same: about 90"!!!!! Ok... this is quite amazing. But there's more! c) Open again the file you crated in case 'a', don't truncate it... randomize again 10000 positions and act as before. You're fast as before, about 0,03" to write 10000 times. This sounds Ok... try another step. d) Now open the file you created in case 'b', don't truncate it... randomize again 10000 positions and act as before. You're slow again and again, but the time is reduced to... 45"!! Maybe, trying again, the time will reduce. I actually wonder why... Any Idea? The following is part of the code I used to test what I told in previuos cases (you'll have to change someting in order to have a clean compilation, I just cut & paste from some source code, sorry). The sample can read and write, in random, ordered or reverse ordered mode, but write only in random order is the clearest test. We tryed using std::fstream but also using directly CreateFile(), WriteFile() and so on the results are the same (even if std::fstream is actually a little slower). Parameters for case 'a' = -f_tempdir_\casea.dat -n10000 -t -p -w Parameters for case 'b' = -f_tempdir_\caseb.dat -n10000 -t -v -w Parameters for case 'b.1' = -f_tempdir_\caseb.dat -n10000 -t -w Parameters for case 'c' = -f_tempdir_\casea.dat -n10000 -w Parameters for case 'd' = -f_tempdir_\caseb.dat -n10000 -w Run the test (and even others) and see... // iotest.cpp : Defines the entry point for the console application. // #include <windows.h> #include <iostream> #include <set> #include <vector> #include "stdafx.h" double RealTime_Microsecs() { LARGE_INTEGER fr = {0, 0}; LARGE_INTEGER ti = {0, 0}; double time = 0.0; QueryPerformanceCounter(&ti); QueryPerformanceFrequency(&fr); time = (double) ti.QuadPart / (double) fr.QuadPart; return time; } int main(int argc, char* argv[]) { std::string sFileName ; size_t stSize, stTimes, stBytes ; int retval = 0 ; char *p = NULL ; char *pPattern = NULL ; char *pReadBuf = NULL ; try { // Default stSize = 1<<30 ; // 1Gb stTimes = 1000 ; stBytes = 50 ; bool bTruncate = false ; bool bPre = false ; bool bPreFast = false ; bool bOrdered = false ; bool bReverse = false ; bool bWriteOnly = false ; // Comsumo i parametri for(int index=1; index < argc; ++index) { if ( '-' != argv[index][0] ) throw ; switch(argv[index][1]) { case 'f': sFileName = argv[index]+2 ; break ; case 's': stSize = xw::str::strtol(argv[index]+2) ; break ; case 'n': stTimes = xw::str::strtol(argv[index]+2) ; break ; case 'b':stBytes = xw::str::strtol(argv[index]+2) ; break ; case 't': bTruncate = true ; break ; case 'p' : bPre = true, bPreFast = false ; break ; case 'v' : bPreFast = true, bPre = false ; break ; case 'o' : bOrdered = true, bReverse = false ; break ; case 'r' : bReverse = true, bOrdered = false ; break ; case 'w' : bWriteOnly = true ; break ; default: throw ; break ; } } if ( sFileName.empty() ) { std::cout << "Usage: -f<File Name> -s<File Size> -n<Number of Reads and Writes> -b<Bytes per Read and Write> -t -p -v -o -r -w" << std::endl ; std::cout << "-t truncates the file, -p pre load the file, -v pre load 'veloce', -o writes in order mode, -r write in reverse order mode, -w Write Only" << std::endl ; std::cout << "Default: 1Gb, 1000 times, 50 bytes" << std::endl ; throw ; } if ( !stSize || !stTimes || !stBytes ) { std::cout << "Invalid Parameters" << std::endl ; return -1 ; } size_t stBestSize = 0x00100000 ; std::fstream fFile ; fFile.open(sFileName.c_str(), std::ios_base::binary|std::ios_base::out|std::ios_base::in|(bTruncate?std::ios_base::trunc:0)) ; p = new char[stBestSize] ; pPattern = new char[stBytes] ; pReadBuf = new char[stBytes] ; memset(p, 0, stBestSize) ; memset(pPattern, (int)(stBytes&0x000000ff), stBytes) ; double dTime = RealTime_Microsecs() ; size_t stCopySize, stSizeToCopy = stSize ; if ( bPre ) { do { stCopySize = std::min(stSizeToCopy, stBestSize) ; fFile.write(p, stCopySize) ; stSizeToCopy -= stCopySize ; } while (stSizeToCopy) ; std::cout << "Creating time is: " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; } else if ( bPreFast ) { fFile.seekp(stSize-1) ; fFile.write(p, 1) ; std::cout << "Creating Fast time is: " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; } size_t stPos ; ::srand((unsigned int)dTime) ; double dReadTime, dWriteTime ; stCopySize = stTimes ; std::vector<size_t> inVect ; std::vector<size_t> outVect ; std::set<size_t> outSet ; std::set<size_t> inSet ; // Prepare vector and set do { stPos = (size_t)(::rand()<<16) % stSize ; outVect.push_back(stPos) ; outSet.insert(stPos) ; stPos = (size_t)(::rand()<<16) % stSize ; inVect.push_back(stPos) ; inSet.insert(stPos) ; } while (--stCopySize) ; // Write & read using vectors if ( !bReverse && !bOrdered ) { std::vector<size_t>::iterator outI, inI ; outI = outVect.begin() ; inI = inVect.begin() ; stCopySize = stTimes ; dReadTime = 0.0 ; dWriteTime = 0.0 ; do { dTime = RealTime_Microsecs() ; fFile.seekp(*outI) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++outI ; if ( !bWriteOnly ) { dTime = RealTime_Microsecs() ; fFile.seekg(*inI) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++inI ; } } while (--stCopySize) ; std::cout << "Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " (Ave: " << xw::str::itoa(dWriteTime/stTimes, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { std::cout << "Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " (Ave: " << xw::str::itoa(dReadTime/stTimes, 10, 'f') << ")" << std::endl ; } } // End // Write in order if ( bOrdered ) { std::set<size_t>::iterator i = outSet.begin() ; dWriteTime = 0.0 ; stCopySize = 0 ; for(; i != outSet.end(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekp(stPos) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Ordered Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dWriteTime/stCopySize, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { i = inSet.begin() ; dReadTime = 0.0 ; stCopySize = 0 ; for(; i != inSet.end(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekg(stPos) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Ordered Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dReadTime/stCopySize, 10, 'f') << ")" << std::endl ; } }// End // Write in reverse order if ( bReverse ) { std::set<size_t>::reverse_iterator i = outSet.rbegin() ; dWriteTime = 0.0 ; stCopySize = 0 ; for(; i != outSet.rend(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekp(stPos) ; fFile.write(pPattern, stBytes) ; dWriteTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Reverse ordered Write time is " << xw::str::itoa(dWriteTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dWriteTime/stCopySize, 10, 'f') << ")" << std::endl ; if ( !bWriteOnly ) { i = inSet.rbegin() ; dReadTime = 0.0 ; stCopySize = 0 ; for(; i != inSet.rend(); ++i) { stPos = *i ; dTime = RealTime_Microsecs() ; fFile.seekg(stPos) ; fFile.read(pReadBuf, stBytes) ; dReadTime += RealTime_Microsecs() - dTime ; ++stCopySize ; } std::cout << "Reverse ordered Read time is " << xw::str::itoa(dReadTime, 5, 'f') << " in " << xw::str::itoa(stCopySize) << " (Ave: " << xw::str::itoa(dReadTime/stCopySize, 10, 'f') << ")" << std::endl ; } }// End dTime = RealTime_Microsecs() ; fFile.close() ; std::cout << "Flush/Close Time is " << xw::str::itoa(RealTime_Microsecs()-dTime, 5, 'f') << std::endl ; std::cout << "Program Terminated" << std::endl ; } catch(...) { std::cout << "Something wrong or wrong parameters" << std::endl ; retval = -1 ; } if ( p ) delete []p ; if ( pPattern ) delete []pPattern ; if ( pReadBuf ) delete []pReadBuf ; return retval ; }

Search Results

Search found 16 results on 1 pages for 'queryperformancecounter'.

Page 1/1 | 1

- by timeitquery

- by myforwik

- by rekotc

- by felipedrl

- by Sanjeet Daga

- by andresjb

- by K.Sandell

- by Nikhil

- by David Titarenco

- by Werner Willemsens

- by Psychic

- by Phenom

- by Hermann

- by numerical25

- by Mr.Gate

- by Roberto Tirabassi