Search Results

Search found 740 results on 30 pages for 'processors'.

Page 20/30 | < Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27  | Next Page >

  • Running 32 bit assembly code on a 64 bit Linux & 64 bit Processor : Explain the anomaly.

    - by claws
    Hello, I'm in an interesting problem.I forgot I'm using 64bit machine & OS and wrote a 32 bit assembly code. I don't know how to write 64 bit code. This is the x86 32-bit assembly code for Gnu Assembler (AT&T syntax) on Linux. //hello.S #include <asm/unistd.h> #include <syscall.h> #define STDOUT 1 .data hellostr: .ascii "hello wolrd\n"; helloend: .text .globl _start _start: movl $(SYS_write) , %eax //ssize_t write(int fd, const void *buf, size_t count); movl $(STDOUT) , %ebx movl $hellostr , %ecx movl $(helloend-hellostr) , %edx int $0x80 movl $(SYS_exit), %eax //void _exit(int status); xorl %ebx, %ebx int $0x80 ret Now, This code should run fine on a 32bit processor & 32 bit OS right? As we know 64 bit processors are backward compatible with 32 bit processors. So, that also wouldn't be a problem. The problem arises because of differences in system calls & call mechanism in 64-bit OS & 32-bit OS. I don't know why but they changed the system call numbers between 32-bit linux & 64-bit linux. asm/unistd_32.h defines: #define __NR_write 4 #define __NR_exit 1 asm/unistd_64.h defines: #define __NR_write 1 #define __NR_exit 60 Anyway using Macros instead of direct numbers is paid off. Its ensuring correct system call numbers. when I assemble & link & run the program. $cpp hello.S hello.s //pre-processor $as hello.s -o hello.o //assemble $ld hello.o // linker : converting relocatable to executable Its not printing helloworld. In gdb its showing: Program exited with code 01. I don't know how to debug in gdb. using tutorial I tried to debug it and execute instruction by instruction checking registers at each step. its always showing me "program exited with 01". It would be great if some on could show me how to debug this. (gdb) break _start Note: breakpoint -10 also set at pc 0x4000b0. Breakpoint 8 at 0x4000b0 (gdb) start Function "main" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Temporary breakpoint 9 (main) pending. Starting program: /home/claws/helloworld Program exited with code 01. (gdb) info breakpoints Num Type Disp Enb Address What 8 breakpoint keep y 0x00000000004000b0 <_start> 9 breakpoint del y <PENDING> main I tried running strace. This is its output: execve("./helloworld", ["./helloworld"], [/* 39 vars */]) = 0 write(0, NULL, 12 <unfinished ... exit status 1> Explain the parameters of write(0, NULL, 12) system call in the output of strace? What exactly is happening? I want to know the reason why exactly its exiting with exitstatus=1? Can some one please show me how to debug this program using gdb? Why did they change the system call numbers? Kindly change this program appropriately so that it can run correctly on this machine. EDIT: After reading Paul R's answer. I checked my files claws@claws-desktop:~$ file ./hello.o ./hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped claws@claws-desktop:~$ file ./hello ./hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped All of my questions still hold true. What exactly is happening in this case? Can someone please answer my questions and provide an x86-64 version of this code?

    Read the article

  • Running 32 bit assembly code on a 64 bit Linux & 64 bit Processor : Expalin the anomaly.

    - by claws
    Hello, I'm in an interesting problem.I forgot I'm using 64bit machine & OS and wrote a 32 bit assembly code. I don't know how to write 64 bit code. This is the x86 32-bit assembly code for Gnu Assembler (AT&T syntax) on Linux. #include <asm/unistd.h> #include <syscall.h> #define STDOUT 1 .data hellostr: .ascii "hello wolrd\n"; helloend: .text .globl _start _start: movl $(SYS_write) , %eax //ssize_t write(int fd, const void *buf, size_t count); movl $(STDOUT) , %ebx movl $hellostr , %ecx movl $(helloend-hellostr) , %edx int $0x80 movl $(SYS_exit), %eax //void _exit(int status); xorl %ebx, %ebx int $0x80 ret Now, This code should run fine on a 32bit processor & 32 bit OS right? As we know 64 bit processors are backward compatible with 32 bit processors. So, that also wouldn't be a problem. The problem arises because of differences in system calls & call mechanism in 64-bit OS & 32-bit OS. I don't know why but they changed the system call numbers between 32-bit linux & 64-bit linux. asm/unistd_32.h defines: #define __NR_write 4 #define __NR_exit 1 asm/unistd_64.h defines: #define __NR_write 1 #define __NR_exit 60 Anyway using Macros instead of direct numbers is paid off. Its ensuring correct system call numbers. when I assemble & link & run the program. Its not printing helloworld. In gdb its showing: Program exited with code 01. I don't know how to debug in gdb. using tutorial I tried to debug it and execute instruction by instruction checking registers at each step. its always showing me "program exited with 01". It would be great if some on could show me how to debug this. (gdb) break _start Note: breakpoint -10 also set at pc 0x4000b0. Breakpoint 8 at 0x4000b0 (gdb) start Function "main" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Temporary breakpoint 9 (main) pending. Starting program: /home/claws/helloworld Program exited with code 01. (gdb) info breakpoints Num Type Disp Enb Address What 8 breakpoint keep y 0x00000000004000b0 <_start> 9 breakpoint del y <PENDING> main I tried running strace. This is its output: execve("./helloworld", ["./helloworld"], [/* 39 vars */]) = 0 write(0, NULL, 12 <unfinished ... exit status 1> Explain the parameters of write(0, NULL, 12) system call in the output of strace? What exactly is happening? I want to know the reason why exactly its exiting with exitstatus=1? Can some one please show me how to debug this program using gdb? Why did they change the system call numbers? Change this program appropriately so that it can run correctly on this machine.

    Read the article

  • Building Android NDK Toolchain for x86 Android on Windows via Cygwin

    - by grrussel
    The Android SDK includes the Android NDK, which in turn contains a customised GCC based tool chain for Android on ARM processors; The question is how to build the NDK tool chain to run on Windows to target x86 Android? The tool chain is already setup to build on Windows (cygwin) targeting ARM; There are also existing pre-built (unofficial) NDKs for targeting x86, but these contain pre-built tools for x86 Linux, not Windows. The NDK contains a build-toolchain.sh script to rebuild its tool chain; the question is, what specifically needs done to get that to build a tool chain targeting Android x86?

    Read the article

  • Why aren't we programming on the GPU???

    - by Chris
    So I finally took the time to learn CUDA and get it installed and configured on my computer and I have to say, I'm quite impressed! Here's how it does rendering the Mandelbrot set at 1280 x 678 pixels on my home PC with a Q6600 and a GeForce 8800GTS (max of 1000 iterations): Maxing out all 4 CPU cores with OpenMP: 2.23 fps Running the same algorithm on my GPU: 104.7 fps And here's how fast I got it to render the whole set at 8192 x 8192 with a max of 1000 iterations: Serial implemetation on my home PC: 81.2 seconds All 4 CPU cores on my home PC (OpenMP): 24.5 seconds 32 processors on my school's super computer (MPI with master-worker): 1.92 seconds My home GPU (CUDA): 0.310 seconds 4 GPUs on my school's super computer (CUDA with static domain decomposition): 0.0547 seconds So here's my question - if we can get such huge speedups by programming the GPU instead of the CPU, why is nobody doing it??? I can think of so many things we could speed up like this, and yet I don't know of many commercial apps that are actually doing it. Also, what kinds of other speedups have you seen by offloading your computations to the GPU?

    Read the article

  • Was Joel right about XML being slow?

    - by Will
    A long time ago Joel explained how various every-day coding things were slow, and this led to XML as a data store being slow: http://www.joelonsoftware.com/articles/fog0000000319.html Are those every-day coding things - strcat and malloc - still slow in a std::string and dlmalloc world? What else has changed in modern processors and mainstream frameworks? And is XML still slow? You can't find an RDBMS that doesn't claim some kind of native XML support these days; haven't they got it faster - a single pass to index it for example - yet?

    Read the article

  • PLINQ delayed execution

    - by tbischel
    I'm trying to understand how parallelism might work using PLINQ, given delayed execution. Here is a simple example. string[] words = { "believe", "receipt", "relief", "field" }; bool result = words.AsParallel().Any(w => w.Contains("ei")); With LINQ, I would expect the execution to reach the "receipt" value and return true, without executing the query for rest of the values. If we do this in parallel, the evaluation of "relief" may have began before the result of "receipt" has returned. But once the query knows that "receipt" will cause a true result, will the other threads yield immediately? In my case, this is important because the "any" test may be very expensive, and I would want to free up the processors for execution of other tasks.

    Read the article

  • When to use Shift operators << >> in C# ?

    - by Junior Mayhé
    I was studying shift operators in C#, trying to find out when to use them in my code. I found an answer but for Java, you could: a) Make faster integer multiplication and division operations: *4839534 * 4* can be done like this: 4839534 << 2 or 543894 / 2 can be done like this: 543894 1 Shift operations much more faster than multiplication for most of processors. b) Reassembling byte streams to int values c) For accelerating operations with graphics since Red, Green and Blue colors coded by separate bytes. d) Packing small numbers into one single long... For b, c and d I can't imagine here a real sample. Does anyone know if we can accomplish all these items in C#? Is there more practical use for shift operators in C#?

    Read the article

  • Alternative to distributed caching

    - by Chen
    Hi, There is a technical requirement to scale a new system easily. This new system consists of three tiered applications (as a batch processors). Each tier will contains at least 2 servers with the same application resides on each server. So, when one of the tier reaches peak performance, we could extend the scalability easily by adding a new server and the same application to off-load some of the processing loads. The problem is that one or two of the three tiers require heavy caching (about 3 million records and increasing). I'm thinking of using distributed caching system to overcome this problem but the new distributed caching system will means an additional point of failure as applications now need to interact with additional caching systems for processing. I'm currently looking at ncache but just wondering if there is an alternatives to this problem? or is there any other comparable distributed caching system that maybe similar or better than ncache and provide enterprise supports too? Thanks, Chen

    Read the article

  • SSRS 2005 - Usability analysis - Is SSRS a good option for this scenario?

    - by Sach
    How practical is it to consider SSRS 2005 or SSRS 2008 as a reporting solution for a report that has to show reports with millions of records (records vary from 3 to 10 million)? Is there any threshold on the size of report in SSRS? How do I know that for a huge report, wheather SSRS will consume the whole memory and start paging the operations to disk or it will give a memory leak error? Even if I keep on increasing the memory how can I be sure that certain memory will be sufficient for such huge reports for the report server? All the above questions are haunting me because I have a dedicated report server with a decent hardware and OS configuration (8 processors, 8GB RAM, 64 bit OS and 64 bit SQL Server 2005). Still my report with around 2 million records is taking more than 6 minutes and going from one page to another takes 3 minutes!!! My datasource is on separate server and when I execute only the stored proc there, it returns the results in less than 2 minutes.

    Read the article

  • F# performance in scientific computing

    - by aaa
    hello. I am curious as to how F# performance compares to C++ performance? I asked a similar question with regards to Java, and the impression I got was that Java is not suitable for heavy numbercrunching. I have read that F# is supposed to be more scalable and more performant, but how is this real-world performance compares to C++? specific questions about current implementation are: How well does it do floating-point? Does it allow vector instructions how friendly is it towards optimizing compilers? How big a memory foot print does it have? Does it allow fine-grained control over memory locality? does it have capacity for distributed memory processors, for example Cray? what features does it have that may be of interest to computational science where heavy number processing is involved? Are there actual scientific computing implementations that use it? Thanks

    Read the article

  • Best CPUs for speeding up compiling times of C++ w/ DistGCC

    - by Jay
    I'm putting together a distributed build farm with DistGCC to speed up our teams compile times and just looking for thoughts on which processors to use in the hosts. Are we going to get a noticeable decrease in time using 8 cores vs. 4-hyperthreaded cores? Big difference in time between i7 and Xeon? etc, etc. Just need advice from people who've put together kick-a build clusters. We've got a majority of the normal things to speed up builds in place (pre-compiled headers, ccache, local gigabit connections between them, tons of ram, etc) so please just give advice on the best processor to use. And money is a factor, but anythings doable if the performance increase is noticeable. Thanks. Jay

    Read the article

  • Free static checker for C99 code

    - by detly
    I am looking for a free static checker for C99 code (including GCC extensions) with the ability to explicitly say "these preprocessor macros are always defined." I need that last part because I am compiling embedded code for a single target processor. The compiler (Microchip's C32, GCC based) sets a macro based on the selected processor, which is then used in the PIC32 header files to select a processor-specific header file to include. cppcheck therefore fails because it detects the 30 different #ifdefs used to select one of the many possible PIC32 processors, tries to analyse all possible combinations of these plus all other #defines, and fails. For example, if splint could process C99 code, I would use splint -D__PIC32_FEATURE_SET__=460 -D__32MX460F512L__ \ -D__LANGUAGE_C__ -I/path/to/my/includes source.c

    Read the article

  • A/UX cc compiler errors on trivial code: "declared argument argc is missing"

    - by Fzn
    On a quite ancient UNIX (Apple A/UX 3.0.1 for 680x0 processors) using the built-in c compiler (cc), this issue arrises. Here is the code I'm trying to compile: #include <stdlib.h> #include <stdio.h> int main() int argc; char **argv; { if (argc > 1) puts(argv[1]); return (EXIT_SUCCESS); } And here is the output I get: pigeonz.root # cc -c test.c "test.c", line 5: declared argument argc is missing "test.c", line 6: declared argument argv is missing Using a more modern prototype did not help, nor did the manual page, nor a quick google search. What am I doing wrong?

    Read the article

  • Mapping of memory addresses to physical modules in Windows XP

    - by Josef Grahn
    I plan to run 32-bit Windows XP on a workstation with dual processors, based on Intel's Nehalem microarchitecture, and triple channel RAM. Even though XP is limited to 4 GB of RAM, my understanding is that it will function with more than 4 GB installed, but will only expose 4 GB (or slightly less). My question is: Assuming that 6 GB of RAM is installed in six 1 GB modules, which physical 4 GB will Windows actually map into its address space? In particular: Will it use all six 1 GB modules, taking advantage of all memory channels? (My guess is yes, and that the mapping to individual modules within a group happens in hardware.) Will it map 2 GB of address space to each of the two NUMA nodes (as each processor has it's own memory interface), or will one processor get fast access to 3 GB of RAM, while the other only has 1 GB? Thanks!

    Read the article

  • How about the Asp.net processes and threads and apppools?

    - by Michel
    Hi, as i understand, when i load a asp.net .aspx page on the (iis)server, it's processed via the w3p.exe process. But when iis gets multiple requests, are they all processed by the same w3p process? And does this process automaticly use all my processors and cores? And after that: when i start i new thread in my page, this thread still works when the pages is already served to the client. Where does this thread live? also in the w3p.exe process? And what if i assign another apppool to my site, what does that do? Michel

    Read the article

  • how to optimize illustrator artwork in flash?

    - by sasi
    I'm working on a flash project that incorporates a lot of artwork done in Illustrator CS4. I've been copy-pasting directly from Illustrator into Flash, and I add some animations as well. Final file is going to be a one single swf file which will be a part of UI for an application and .net will be the core for this. But now flash becomes unusable slow to respond for actions. My machine is a fast i7 with 6gb of RAM, so I don't think that's the issue. We are going to use this file with dual core atom processors. Does anyone have ideas for alternative importing techniques, optimizations within illustrator, anything at all that will make this more manageable? Thanks

    Read the article

  • Terminate long running thread in thread pool that was created using QueueUserWorkItem(win 32/nt5).

    - by Jake
    I am programming in a win32 nt5 environment. I have a function that is going to be called many times. Each call is atomic. I would like to use QueueUserWorkItem to take advantage of multicore processors. The problem I am having is I only want to give the function 3 seconds to complete. If it has not completed in 3 seconds I want to terminate the thread. Currently I am doing something like this: HANDLE newThreadFuncCall= CreateThread(NULL,0,funcCall,&func_params,0,NULL); DWORD result = WaitForSingleObject(newThreadFuncCall, 3000); if(result == WAIT_TIMEOUT) { TerminateThread(newThreadFuncCall,WAIT_TIMEOUT); } I just spawn a single thread and wait for 3 seconds or it to complete. Is there anyway to do something similar to but using QueueUserWorkItem to queue up the work? Thanks!

    Read the article

  • Endianness inside CPU registers

    - by Abhishek Tamhane
    I need help understanding endianness inside CPU registers of x86 processors. I wrote this small assembly program: section .data section .bss section .text global _start _start: nop mov eax, 0x78FF5ABC mov ebx,'WXYZ' nop ; GDB breakpoint here. mov eax, 1 mov ebx, 0 int 0x80 I ran this program in GDB with a breakpoint on line number 10 (commented in the source above). At this breakpoint, info registers shows the value of eax=0x78ff5abc and ebx=0x5a595857. Since the ASCII codes for W, X, Y, Z are 57, 58, 59, 5A respectively; and intel is little endian, 0x5a595857 seems like the correct byte order (least significant byte first). Why isn't then the output for eax register 0xbc5aff78 (least significant byte of the number 0x78ff5abc first) instead of 0x78ff5abc?

    Read the article

  • deciding between subprocess, multiprocesser and thread in Python?

    - by user248237
    I'd like to parallelize my Python program so that it can make use of multiple processors on the machine that it runs on. My parallelization is very simple, in that all the parallel "threads" of the program are independent and write their output to separate files. I don't need the threads to exchange information but it is imperative that I know when the threads finish since some steps of my pipeline depend on their output. Portability is important, in that I'd like this to run on any Python version on Mac, Linux and Windows. Given these constraints, which is the most appropriate Python module for implementing this? I am tryign to decide between thread, subprocess and multiprocessing, which all seem to provide related functionality. Any thoughts on this? I'd like the simplest solution that's portable. Thanks.

    Read the article

  • Is AsParallel() good practice in a web environment?

    - by Bjorn Bailleul
    I have no doubt that for client applications, AsParallel() will bring some out-of-the-box performance gains. But what if I would use it in a web environment. Let's say I have a widget framework that loops over all widgets to get their data and render output. This would parallelize great no? I do have my doubts on using AsParallel() in this scenario. What if I have a large number of visitors for my site, isn't IIS going to use multiple threads to handle all requests? Aren't there going to be locking issues presented after a while, or threads dying because all processors are in use? It's just a thought, what do you think about this?

    Read the article

  • JavaScript multithreading

    - by Krzysztof Hasinski
    I'm working on comparison for several different methods of implementing (real or fake) multithreading in JavaScript. As far as I know only webworkers and Google Gears WorkerPool can give you real threads (ie. spread across multiple processors with real parallel execution). I've found the following methods: switch between tasks using yield() use setInterval() (or other non-blocking function) with threads waiting one for another use Google Gears WorkerPool threads (with plugin) use html5 web workers I read related questions and found several variations of the above methods, but most of those questions are old, so there might be a few new ideas. I'm wondering - how else can you achieve multithreading in JavaScript? Any other important methods? UPDATE: As pointed out in comments what I really meant was concurrency. UPDATE 2: I found information that Silverlight + JScript supports multithreading, but I'm unable to verify this. UPDATE 3: Google deprecated Gears: http://code.google.com/apis/gears/api_workerpool.html

    Read the article

  • Web App Server hardware question. Which configuration?

    - by JBeckton
    I am pricing some new servers and I am not sure which configuration to get. The server will be running some web applications for our company. Some of them are ASP.Net sites and some are ColdFusion. The OS will be Win Server 2008 Web or Standard Edition. Do I need 2 processors or will a single quad core handle it? Xeon multi core Hyperthreading or non Hyperthreading? I am going 64bit so I can go higher than 4 Gigs of Ram. I am shopping at Dell and there are so many options, i do not want to get too much hardware and not use half of it because that would be a waste of money and I do not want to get too little and have to ask for more money to upgrade it later.

    Read the article

  • Branchless memory manager?

    - by Richard Fabian
    Anyone thought about how to write a memory manager (in C++) that is completely branch free? I've written a pool, a stack, a queue, and a linked list (allocating from the pool), but I am wondering how plausible it is to write a branch free general memory manager. This is all to help make a really reusable framework for doing solid concurrent, in-order CPU, and cache friendly development. Edit: by branchless I mean without doing direct or indirect function calls, and without using ifs. I've been thinking that I can probably implement something that first changes the requested size to zero for false calls, but haven't really got much more than that. I feel that it's not impossible, but the other aspect of this exercise is then profiling it on said "unfriendly" processors to see if it's worth trying as hard as this to avoid branching.

    Read the article

  • Does the .NET CLR Really Optimize for the Current Processor

    - by dewald
    When I read about the performance of JITted languages like C# or Java, authors usually say that they should/could theoretically outperform many native-compiled applications. The theory being that native applications are usually just compiled for a processor family (like x86), so the compiler cannot make certain optimizations as they may not truly be optimizations on all processors. On the other hand, the CLR can make processor-specific optimizations during the JIT process. Does anyone know if Microsoft's (or Mono's) CLR actually performs processor-specific optimizations during the JIT process? If so, what kind of optimizations?

    Read the article

  • C#: Farm out jobs to worker processes on a multi-processor machine

    - by Andrew White
    Hi there, I have a generic check that needs to be run on ca. 1000 objects. The check takes about 3 seconds. We have a server with 4 processors (and we also have other multi-processor servers in our network) so we would like to create an exe / dll to do the checking and return the results to the "master". Does anyone know of a framework for this, or how would one go about it in C#? Specifically: * What's the best way to transfer data between the master and the worker process? * How would the master ensure that always 4 processes are running at any one time and as soon as a worker process is finished start a new one. * How to register that the worker is finished and append it's results to a list? Hope it's clear enough but happy to clarify. A.

    Read the article

< Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27  | Next Page >