x86 - Page 15 - Developer IT

How do you use printf from Assembly?

- by bobobobo

I have an MSVC++ project set up to compile and run assembly code. In main.c: #include <stdio.h> void go() ; int main() { go() ; // call the asm routine } In go.asm: .586 .model flat, c .code go PROC invoke puts,"hi" RET go ENDP end But when I compile and run, I get an error in go.asm: error A2006: undefined symbol : puts How do I define the symbols in <stdio.h> for the .asm files in the project?

Read the article

Where is the bottleneck in this code?

- by Mikhail

I have the following tight loop that makes up the serial bottle neck of my code. Ideally I would parallelize the function that calls this but that is not possible. //n is about 60 for (int k = 0;k < n;k++) { double fone = z[k*n+i+1]; double fzer = z[k*n+i]; z[k*n+i+1]= s*fzer+c*fone; z[k*n+i] = c*fzer-s*fone; } Are there any optimizations that can be made such as vectorization or some evil inline that can help this code? I am looking into finding eigen solutions of tridiagonal matrices. http://www.cimat.mx/~posada/OptDoglegGraph/DocLogisticDogleg/projects/adjustedrecipes/tqli.cpp.html

Read the article

Why is FLD1 loading NaN instead?

- by Bernd Jendrissek

I have a one-liner C function that is just return value * pow(1.+rate, -delay); - it discounts a future value to a present value. The interesting part of the disassembly is 0x080555b9 : neg %eax 0x080555bb : push %eax 0x080555bc : fildl (%esp) 0x080555bf : lea 0x4(%esp),%esp 0x080555c3 : fldl 0xfffffff0(%ebp) 0x080555c6 : fld1 0x080555c8 : faddp %st,%st(1) 0x080555ca : fxch %st(1) 0x080555cc : fstpl 0x8(%esp) 0x080555d0 : fstpl (%esp) 0x080555d3 : call 0x8051ce0 0x080555d8 : fmull 0xfffffff8(%ebp) While single-stepping through this function, gdb says (rate is 0.02, delay is 2; you can see them on the stack): (gdb) si 0x080555c6 30 return value * pow(1.+rate, -delay); (gdb) info float R7: Valid 0x4004a6c28f5c28f5c000 +41.68999999999999773 R6: Valid 0x4004e15c28f5c28f6000 +56.34000000000000341 R5: Valid 0x4004dceb851eb851e800 +55.22999999999999687 R4: Valid 0xc0008000000000000000 -2 =R3: Valid 0x3ff9a3d70a3d70a3d800 +0.02000000000000000042 R2: Valid 0x4004ff147ae147ae1800 +63.77000000000000313 R1: Valid 0x4004e17ae147ae147800 +56.36999999999999744 R0: Valid 0x4004efb851eb851eb800 +59.92999999999999972 Status Word: 0x1861 IE PE SF TOP: 3 Control Word: 0x037f IM DM ZM OM UM PM PC: Extended Precision (64-bits) RC: Round to nearest Tag Word: 0x0000 Instruction Pointer: 0x73:0x080555c3 Operand Pointer: 0x7b:0xbff41d78 Opcode: 0xdd45 And after the fld1: (gdb) si 0x080555c8 30 return value * pow(1.+rate, -delay); (gdb) info float R7: Valid 0x4004a6c28f5c28f5c000 +41.68999999999999773 R6: Valid 0x4004e15c28f5c28f6000 +56.34000000000000341 R5: Valid 0x4004dceb851eb851e800 +55.22999999999999687 R4: Valid 0xc0008000000000000000 -2 R3: Valid 0x3ff9a3d70a3d70a3d800 +0.02000000000000000042 =R2: Special 0xffffc000000000000000 Real Indefinite (QNaN) R1: Valid 0x4004e17ae147ae147800 +56.36999999999999744 R0: Valid 0x4004efb851eb851eb800 +59.92999999999999972 Status Word: 0x1261 IE PE SF C1 TOP: 2 Control Word: 0x037f IM DM ZM OM UM PM PC: Extended Precision (64-bits) RC: Round to nearest Tag Word: 0x0020 Instruction Pointer: 0x73:0x080555c6 Operand Pointer: 0x7b:0xbff41d78 Opcode: 0xd9e8 After this, everything goes to hell. Things get grossly over or undervalued, so even if there were no other bugs in my freeciv AI attempt, it would choose all the wrong strategies. Like sending the whole army to the arctic. (Sigh, if only I were getting that far.) I must be missing something obvious, or getting blinded by something, because I can't believe that fld1 should ever possibly fail. Even less that it should fail only after a handful of passes through this function. On earlier passes the FPU correctly loads 1 into ST(0). The bytes at 0x080555c6 definitely encode fld1 - checked with x/... on the running process. What gives?

Read the article

Poco C++ library on OSX 10.8.2: Undefined symbols for architecture x86_64

- by Arman

I'm trying to use Poco C++ library to do the simple http requests in C++ on Mac OS X 10.8.2. I installed Poco, copy-pasted the http_request.cc code from this tutorial, ran it with 'g++ -o http_get http_get.cc -lPocoNet', but got: Undefined symbols for architecture x86_64: "Poco::StreamCopier::copyStream(std::basic_istream<char, std::char_traits<char> >&, std::basic_ostream<char, std::char_traits<char> >&, unsigned long)", referenced from: _main in ccKuZb1g.o "Poco::URI::URI(char const*)", referenced from: _main in ccKuZb1g.o "Poco::URI::~URI()", referenced from: _main in ccKuZb1g.o "Poco::URI::getPathAndQuery() const", referenced from: _main in ccKuZb1g.o "Poco::URI::getPort() const", referenced from: _main in ccKuZb1g.o "Poco::Exception::displayText() const", referenced from: _main in ccKuZb1g.o "typeinfo for Poco::Exception", referenced from: GCC_except_table1 in ccKuZb1g.o ld: symbol(s) not found for architecture x86_64 collect2: ld returned 1 exit status Have been struggling with this for couple of hours. Any idea how to fix this? Thanks in advance!

Read the article

Odd optimization problem under MSVC

- by Goz

I've seen this blog: http://igoro.com/archive/gallery-of-processor-cache-effects/ The "weirdness" in part 7 is what caught my interest. My first thought was "Thats just C# being weird". Its not I wrote the following C++ code. volatile int* p = (volatile int*)_aligned_malloc( sizeof( int ) * 8, 64 ); memset( (void*)p, 0, sizeof( int ) * 8 ); double dStart = t.GetTime(); for (int i = 0; i < 200000000; i++) { //p[0]++;p[1]++;p[2]++;p[3]++; // Option 1 //p[0]++;p[2]++;p[4]++;p[6]++; // Option 2 p[0]++;p[2]++; // Option 3 } double dTime = t.GetTime() - dStart; The timing I get on my 2.4 Ghz Core 2 Quad go as follows: Option 1 = ~8 cycles per loop. Option 2 = ~4 cycles per loop. Option 3 = ~6 cycles per loop. Now This is confusing. My reasoning behind the difference comes down to the cache write latency (3 cycles) on my chip and an assumption that the cache has a 128-bit write port (This is pure guess work on my part). On that basis in Option 1: It will increment p[0] (1 cycle) then increment p[2] (1 cycle) then it has to wait 1 cycle (for cache) then p[1] (1 cycle) then wait 1 cycle (for cache) then p[3] (1 cycle). Finally 2 cycles for increment and jump (Though its usually implemented as decrement and jump). This gives a total of 8 cycles. In Option 2: It can increment p[0] and p[4] in one cycle then increment p[2] and p[6] in another cycle. Then 2 cycles for subtract and jump. No waits needed on cache. Total 4 cycles. In option 3: It can increment p[0] then has to wait 2 cycles then increment p[2] then subtract and jump. The problem is if you set case 3 to increment p[0] and p[4] it STILL takes 6 cycles (which kinda blows my 128-bit read/write port out of the water). So ... can anyone tell me what the hell is going on here? Why DOES case 3 take longer? Also I'd love to know what I've got wrong in my thinking above, as i obviously have something wrong! Any ideas would be much appreciated! :) It'd also be interesting to see how GCC or any other compiler copes with it as well! Edit: Jerry Coffin's idea gave me some thoughts. I've done some more tests (on a different machine so forgive the change in timings) with and without nops and with different counts of nops case 2 - 0.46 00401ABD jne (401AB0h) 0 nops - 0.68 00401AB7 jne (401AB0h) 1 nop - 0.61 00401AB8 jne (401AB0h) 2 nops - 0.636 00401AB9 jne (401AB0h) 3 nops - 0.632 00401ABA jne (401AB0h) 4 nops - 0.66 00401ABB jne (401AB0h) 5 nops - 0.52 00401ABC jne (401AB0h) 6 nops - 0.46 00401ABD jne (401AB0h) 7 nops - 0.46 00401ABE jne (401AB0h) 8 nops - 0.46 00401ABF jne (401AB0h) 9 nops - 0.55 00401AC0 jne (401AB0h) I've included the jump statetements so you can see that the source and destination are in one cache line. You can also see that we start to get a difference when we are 13 bytes or more apart. Until we hit 16 ... then it all goes wrong. So Jerry isn't right (though his suggestion DOES help a bit), however something IS going on. I'm more and more intrigued to try and figure out what it is now. It does appear to be more some sort of memory alignment oddity rather than some sort of instruction throughput oddity. Anyone want to explain this for an inquisitive mind? :D Edit 3: Interjay has a point on the unrolling that blows the previous edit out of the water. With an unrolled loop the performance does not improve. You need to add a nop in to make the gap between jump source and destination the same as for my good nop count above. Performance still sucks. Its interesting that I need 6 nops to improve performance though. I wonder how many nops the processor can issue per cycle? If its 3 then that account for the cache write latency ... But, if thats it, why is the latency occurring? Curiouser and curiouser ...

Read the article

Examining C/C++ Heap memory statistics in gdb

- by fd

I'm trying to investigate the state of the C/C++ heap from within gdb on Linux amd64, is there a nice way to do this? One approach I've tried is to "call mallinfo()" but unfortunately I can't then extract the values I want since gdb deal with the return value properly. I'm not easily able to write a function to be compiled into the binary for the process I am attached to, so I can simply implement my own function to extract the values by calling mallinfo() in my own code this way. Is there perhaps a clever trick that will allow me to do this on-the-fly? Another option could be to locate the heap and traverse the malloc headers / free list; I'd appreciate any pointers to where I could start in finding the location and layout of these. I've been trying to Google and read around the problem for about 2 hours and I've learnt some fascinating stuff but still not found what I need.

Read the article

Why does 64-bit Windows need a separate "Program Files (x86)" folder?

- by Stephen Jennings

I know that on a 64-bit version of Windows the "Program Files" folder is for 64-bit programs and the "Program Files (x86)" folder is for 32-bit programs, but why is this even necessary? By "necessary", I don't mean "why could Microsoft not have made any other design decisions?" because of course they could have. Rather, I mean, "why, given the current design of 64-bit Windows, must 32-bit programs have a separate top-level folder from 64-bit programs?" There are plenty of questions on Super User and elsewhere that assert "one is for 32-bit programs, one is for 64-bit programs", but none that I can find give the reason. From my experience, it doesn't seem to matter whether a 32-bit program is installed in the correct place or not. Does Windows somehow present itself differently to a program running out of "Program Files (x86)"? Is there a description that shows exactly what's different for a program installed in "Program Files (x86)" instead of "Program Files"? I think it's unlikely that Microsoft would introduce a new folder without a legitimate technical reason.

Read the article

What am I doing wrong? (Simple Assembly Loop)

- by sunnyohno

It won't let me post the picture. Btw, Someone from Reddit.programming sent me over here. So thanks! TITLE MASM Template ; Description ; ; Revision date: INCLUDE Irvine32.inc .data myArray BYTE 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 .code main PROC call Clrscr mov esi, OFFSET myArray mov ecx, LENGTHOF myArray mov eax, 0 L1: add eax, [esi] inc esi loop L1 call WriteInt exit main ENDP END main Results in: -334881242

Read the article

"cpuid" before "rdtsc"

- by Alex B

Sometimes I encounter code that reads TSC with rdtsc instruction, but calls cpuid before it? Why?

Read the article

Calling assembly in GCC????

- by rbr200

include static inline uint xchg(volatile unsigned int *addr, unsigned int newval) { uint result; asm volatile("lock; xchgl %0, %1" : "+m" (*addr), "=a" (result) : "1" (newval) : "cc"); return result; } Can some one tell me what this code does exactly. I mean I have an idea or the parts of this command. "1" newval is the input, "=a" is to flush out its previous value and update it. "m" is for the memory operation but I am confused about the functionality of this function. What does the "+m" sign do? Does this function do sumthing like m=a; m = newval; return a

Read the article

Accessing PCI Device from user space programs

- by crissangel

I have a device which would be interface with my processor through pcie. I have written driver for it using the existing pci file operations. Now my problem is how do I access it from user space programs? PCI File operations do not have IOCTL support and hence I cant make an ioctl call unlike other char devices. I cannot use pci_config_read_byte etc. functions as they are meant for kernel space(included in linux/pci.h).

Read the article

How to move value from the stack to ST(0)?

- by George Edison

I am having trouble believing the following code is the most efficient way to move a value from the stack to ST(0): .data var dd 4.2 tmp dd ? .code mov EAX, var push EAX ; top of stack now contains a value ; move it to ST(0) pop EAX mov tmp, EAX fld tmp Is the temporary variable really necessary? Further, is there an easier way to get a value from the stack to ST(0)?

Read the article

PC boot: dl register and drive number

- by kikou

I read somewhere in the internet that, before jumping to 0x7c00, the BIOS loads into %dl the "drive number" of the booted device. But what is this "drive number"? Each device attached to the computer is assigned a number by the BIOS? If so, how can I know which number is a given device assigned to? Reading GRUB's source code I found when %dl has bits 0x80 and 0x70 set, it overwrites the whole register with 0x80. Why is that? Here is the code: jmp 3f /* grub-setup may overwrite this jump */ testb $0x80, %dl jz 2f 3: /* Ignore %dl different from 0-0x0f and 0x80-0x8f. */ testb $0x70, %dl jz 1f 2: movb $0x80, %dl 1: By the way. Is there any detailed resource on the boot process of PC's in the web? Specially about what the BIOS does before giving the control to the bootloader and also the standard codes used to communicate with it (like that "drive numer"). I was hoping to write my own bootloader and everything I found is a bit too vague, not technical enough to the point of informing of the exact state of the computer when my bootloader starts to run.

Read the article

How To Completely Move Users/Program Files/Program Files (x86)/ProgramData (Folders) To Another Partition(s) On Windows 8?

- by Enigma83

I am attempting to move folders Users Program Files Program Files (x86), ProgramData (at the root of the C drive) to at least 2 other partitions, preferably on a fresh install. I have read that there are methods for doing this post-install, but it seems like it would be a bit more tedious to do things that way. I want to move the 2 Program Files folders to another partition on the same HDD, and Users/ProgramData will go to yet another partition on same HDD. I have done a bit of research on this, read up on some things that involved booting into Audit Mode, using the RoboCopy command to copy folders via booting into my Windows 8 USB drive, creating NTFS junctions/symbolic links, Registry edits, as well as accomplishing this automatically by creating an auto-attend file which Windows Setup processes automatically before the user is ever booted in for the 1st time. I tried this morning and now have a basic installation in which programs like Internet Explorer fail to open, certain files can't be found/opened (even if I click on them directly), an example is Regedit. Also, I can't run the Command/DOS (CMD) prompt as Administrator (or otherwise, as any other user), can't activate the real Administrator account or open any of the Administrative Tools (despite having added them to my Start Screen). So far I have only tried RoboCopy-ing Program Files and Program Files (x86) so far, creating junction points for them, and editing the Registry in the relevant locations. This is what I'm left with now. I also found the following blog article which describes how to do this for Windows 7 So, where should I go from here and where can I find more information? And how can this be done without disabling the Metro apps, which I've read will stop working if you move ProgramData. Once I have everything moved, where do I install programs to? Do I tell them to install to C:\Program Files\Program Files (x86) or to the junctioned/symbolic-linked partition/drive? I plan to test in VMware virtual machines from here on until things are working correctly, while using a baseline default install for daily tasks.

Read the article

How do machine code instructions get transferred to the CPU?

- by user3711789

I'm currently investigating what the runtime of different programming languages looks like behind the scenes. For a compiled language like C, people usually give the explanation of "Code is compiled to assembly which is assembled and linked into a binary executable. The executable is then loaded into memory and the CPU interprets it." My question is how does the CPU know where to look for the next instruction to execute? Is it a memory address stored in one of the registers?

Read the article

What binary architectures should be cross-compiled when building Mac OS X packages?

- by Alex Leach

Currently, Apple's native binaries and libraries are distributed as fat files, with support for both i386 and x86_64 architectures. The SDK (Xcode 4.4 w/ command line tools) doesn't support cross-compiling powerpc binaries any more, so they can be safely ignored I think, but there doesn't seem to be any specific guidelines or recommendations about which Intel architectures to support. So, when compiling code for distribution on OS X, do people still cross-compile for the i386 architecture? Or are x86_64 binaries the only architecture worth bothering with nowadays?

Read the article

Implementing traceback on i386

- by markelliott2000

Hi, I am currently porting our code from an alpha (Tru64) to an i386 processor (Linux) in C. Everything has gone pretty smoothly up until I looked into porting our exception handling routine. Currently we have a parent process which spawns lots of sub processes, and when one of these sub-processes fatal's (unfielded) I have routines to catch the process. I am currently struggling to find the best method of implementing a traceback routine which can list the function addresses in the error log, currently my routine just prints the the signal which caused the exception and the exception qualifier code. Any help would be greatly received, ideally I would write error handling for all processors, however at this stage I only really care about i386, and x86_64. Thanks Mark

Read the article

Effective optimization strategies on modern C++ compilers

- by user168715

I'm working on scientific code that is very performance-critical. An initial version of the code has been written and tested, and now, with profiler in hand, it's time to start shaving cycles from the hot spots. It's well-known that some optimizations, e.g. loop unrolling, are handled these days much more effectively by the compiler than by a programmer meddling by hand. Which techniques are still worthwhile? Obviously, I'll run everything I try through a profiler, but if there's conventional wisdom as to what tends to work and what doesn't, it would save me significant time. I know that optimization is very compiler- and architecture- dependent. I'm using Intel's C++ compiler targeting the Core 2 Duo, but I'm also interested in what works well for gcc, or for "any modern compiler." Here are some concrete ideas I'm considering: Is there any benefit to replacing STL containers/algorithms with hand-rolled ones? In particular, my program includes a very large priority queue (currently a std::priority_queue) whose manipulation is taking a lot of total time. Is this something worth looking into, or is the STL implementation already likely the fastest possible? Along similar lines, for std::vectors whose needed sizes are unknown but have a reasonably small upper bound, is it profitable to replace them with statically-allocated arrays? I've found that dynamic memory allocation is often a severe bottleneck, and that eliminating it can lead to significant speedups. As a consequence I'm interesting in the performance tradeoffs of returning large temporary data structures by value vs. returning by pointer vs. passing the result in by reference. Is there a way to reliably determine whether or not the compiler will use RVO for a given method (assuming the caller doesn't need to modify the result, of course)? How cache-aware do compilers tend to be? For example, is it worth looking into reordering nested loops? Given the scientific nature of the program, floating-point numbers are used everywhere. A significant bottleneck in my code used to be conversions from floating point to integers: the compiler would emit code to save the current rounding mode, change it, perform the conversion, then restore the old rounding mode --- even though nothing in the program ever changed the rounding mode! Disabling this behavior significantly sped up my code. Are there any similar floating-point-related gotchas I should be aware of? One consequence of C++ being compiled and linked separately is that the compiler is unable to do what would seem to be very simple optimizations, such as move method calls like strlen() out of the termination conditions of loop. Are there any optimization like this one that I should look out for because they can't be done by the compiler and must be done by hand? On the flip side, are there any techniques I should avoid because they are likely to interfere with the compiler's ability to automatically optimize code? Lastly, to nip certain kinds of answers in the bud: I understand that optimization has a cost in terms of complexity, reliability, and maintainability. For this particular application, increased performance is worth these costs. I understand that the best optimizations are often to improve the high-level algorithms, and this has already been done.

Read the article

How can a extend memory space at 8086 up to 1 GB ???

- by user354186

How can a extend memory space at 8086 up to 1 GB ???

Read the article

I've built a Windows service as "Any CPU". Why does it run in 32-bit mode on my 64 bit machine?

- by Mark

I've built a Windows service as "Any CPU". However, when I run it on my 64 bit machine it runs in 32 bit. How can I fix it? I'm using .NET and C#, and my operating system is Windows 2008 R2. If I build it in x64 it correctly loads in 64 bit mode. However, "Any Cpu" -- which is what I want -- loads in 32 bit, even though the machine it's running on perfectly supports 64 bit. Thanks for any help.

Read the article

Fixing the timezone on Solaris 10

- by Nik

I have no ideas how to fix this. In my /etc/TIMEZONE file the TZ variable has the correct value (Canada/Eastern) and still it's showing a -1 hour lag. Where else should I be looking?

Read the article

Can I load a 32 bit DLL into a 64 bit process on Windows?

- by Lee

I recently upgraded a c# windows service to run as a 64 bit .net process. Normally, this would be trivial, but the system makes use of a 32-bit DLL written in C++. It is not an option to convert this DLL to 64 bit, so I wrapped the DLL in a separate 32 bit .net process and exposed a .net interface via remoting. This is quite a reliable solution, but I would prefer to run the system as a single process. Is there any way I can load my 32 bit DLL into a 64 bit process and access it directly (perhaps through some sort of thunking layer)?

Read the article

Open x64 'SOFTWARE' registry key in C#

- by Lance May

I am trying to read the 64-bit HKLM\SOFTWARE registry key from a 32-bit (C#) application. This, of course, keeps redirecting my view to HKLM\SOFTWARE\Wow6432Node. According to what I've found this is doable, but I can't seem to find a .NET example anywhere. I just need to read; not write. Anyone ran across this before?

Read the article

How is the implicit segment register of a near pointer determined?

- by Daniel Trebbien

In section 4.3 of Intel 64® and IA-32 Architectures Software Developer's Manual. Volume 1: Basic Architecture, it says: A near pointer is a 32-bit offset ... within a segment. Near pointers are used for all memory references in a flat memory model or for references in a segmented model where the identity of the segment being accessed is implied. This leads me to wondering: how is the implied segment register determined? I know that (%eip) and displaced (%eip) (e.g. -4(%eip)) addresses use %cs by default, and that (%esp) and displaced (%esp) addresses use %ss, but what about (%eax), (%edx), (%edi), (%ebp) etc., and can the implicit segment register depend also on the instruction that the memory address operand appears in?

Search Results

Search found 2568 results on 103 pages for 'x86'.

Page 15/103 | < Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22 | Next Page >

- by bobobobo

- by Mikhail

- by Bernd Jendrissek

- by Arman

- by Goz

- by fd

- by Stephen Jennings

- by sunnyohno

- by Alex B

- by rbr200

- by crissangel

- by George Edison

- by kikou

- by Enigma83

- by user3711789

- by Alex Leach

- by markelliott2000

- by user168715

- by user354186

- by Mark

- by Nik

- by Lee

- by Lance May

- by Daniel Trebbien

- by anonymous

< Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22 | Next Page >