Search Results

Search found 15224 results on 609 pages for 'parallel python'.

Page 444/609 | < Previous Page | 440 441 442 443 444 445 446 447 448 449 450 451  | Next Page >

  • for ps aux what are Ss Sl Ssl proccess types UNIX

    - by JiminyCricket
    when doing a "ps aux" command I get some process listed as Ss, Ssl and Sl what do these mean? root 24653 0.0 0.0 2256 8 ? Ss Apr12 0:00 /bin/bash -c /usr/bin/python /var/python/report_watchman.py root 24654 0.0 0.0 74412 88 ? Sl Apr12 0:01 /usr/bin/python /var/python/report_watchman.py root 21976 0.0 0.0 2256 8 ? Ss Apr14 0:00 /bin/bash -c /usr/bin/python /var/python/report_watchman.py root 21977 0.0 0.0 73628 88 ? Sl Apr14 0:01 /usr/bin/python /var/python/report_watchman.py

    Read the article

  • Installing Django on Windows

    - by Pranav
    Ever needed to install Django in a Microsoft Windows environment, here is a quick start guide to make that happen: Read through the official Django installation documentation, it might just save you a world of hut down the road. Download Python for your version of Windows. Install Python, my preference here is to put it into the Program Files folder under a folder named Python<Version> Add your chosen Python installation path into your Windows path environment variable. This is an optional step, however it allows you to just type python in the command line and have it fire up the Python interpreter. An easy way of adding it is going into Control Panel, System and into the Environment Variables section. Download Django, you can either download a compressed file or if you’re comfortable with using version control – check it out from the Django Subversion repository. Create a folder named django under your <Python installation directory>\Lib\site-packages\ folder. Using my example above that would have been C:\Program Files\Python25\Lib\site-packages\. If you chose to download the compressed file, open it and extract the contents of the django folder into your newly created folder. If you’d prefer to check it out from Subversion, the normal check out points are http://code.djangoproject.com/svn/django/trunk/ for the latest development copy or a named release which you’ll find under http://code.djangoproject.com/svn/django/tags/releases/. Done, you now have a working Django installation on Windows. At this point, it’d be pertinent to confirm that everything is working properly, which you can do by following the first Django tutorial. The tutorial will make mention of django-admin.py, which is a utility which offers some basic functionality to get you off the ground. The file is located in the bin folder under your Django installation directory. When you need to use it, you can either type in the full path to it or simply add that file path into your environment variables as well. Hope this helps!

    Read the article

  • What design pattern (in python) to use for properly seperate runtime infos with core code?

    - by user1824372
    I am not sure if this is a clear question. I work on a python project that is based on terminal(console), for which I am planning to implement a GUI. I am not major in CS so I really have no idea about how to effectively design a message system such that: in console, it provide nice look info when runtime. in GUI, it is directed to a certain widget, let's say, a text label, or a bottom bar, or a hide-able frame. Do you have any suggestions? Currently, I am using print function to provide essential informations on stdout during runtime. So a lot of print .... are distributed here and there among the code. I am thinking to use macro-like variables such as 'FILE_NOT_EXTIS_MESSAGE' for printing, and define the variables in one file. Is this a standard way that people always do? How about I introduce a logging system? In sum, I am ask for a pattern that people are commonly using for handling of screen output information with high effectiveness and adaptivity.

    Read the article

  • Is it an MD5 digest in this Python script?

    - by brilliant
    Hello, I am trying to understand this simple hashlib code in Python that has been given to me the other day on "Stackoverflow": import hashlib m = hashlib.md5() m.update("Nobody inspects") m.update(" the spammish repetition here") m.digest() '\xbbd\x9c\x83\xdd\x1e\xa5\xc9\xd9\xde\xc9\xa1\x8d\xf0\xff\xe9' m.digest_size 16 m.block_size 64 print m I thought that "print m" would show me the MD5 digest of the phrase: "Nobody inspects the spammish repetition here", but as a result I got this line on my local host: <md5 HASH object @ 01806220> Strange, when I refreshed the page, I got another line: <md5 HASH object @ 018062E0> and every time when I refresh it, I get another value: md5 HASH object @ 017F8AE0 md5 HASH object @ 01806220 md5 HASH object @ 01806360 md5 HASH object @ 01806400 md5 HASH object @ 01806220 Why is it so? I guess, what I have in each line flowing "@" is not really a digest. Then, what is it? And how can I display MD5 digest here in this code? My python version is Python 2.5 and the framework I am currently using is webapp (I have downloaded it together with SDK from "Google App Engine")

    Read the article

  • Programação paralela no .NET Framework 4 – Parte I

    - by anobre
    Introdução O avanço de tecnologia nos últimos anos forneceu, a baixo custo, acesso  a workstations com inúmeros CPUs. Facilmente encontramos hoje máquinas clientes com 2, 4 e até 8 núcleos, sem considerar os “super-servidores” com até 36 processadores :) Da wikipedia: A Unidade central de processamento (CPU, de acordo com as iniciais em inglês) ou o processador é a parte de um sistema de computador que executa as instruções de um programa de computador, e é o elemento primordial na execução das funções de um computador. Este termo tem sido usado na indústria de computadores pelo menos desde o início dos anos 1960[1]. A forma, desenho e implementação de CPUs têm mudado dramaticamente desde os primeiros exemplos, mas o seu funcionamento fundamental permanece o mesmo. Fazendo uma analogia, seria muito interessante delegarmos tarefas no mundo real que podem ser executadas independentemente a pessoas diferentes, atingindo desta forma uma  maior performance / produtividade na sua execução. A computação paralela se baseia na idéia que um problema maior pode ser dividido em problemas menores, sendo resolvidos de forma paralela. Este pensamento é utilizado há algum tempo por HPC (High-performance computing), e através das facilidades dos últimos anos, assim como a preocupação com consumo de energia, tornaram esta idéia mais atrativa e de fácil acesso a qualquer ambiente. No .NET Framework A plataforma .NET apresenta um runtime, bibliotecas e ferramentas para fornecer uma base de acesso fácil e rápido à programação paralela, sem trabalhar diretamente com threads e thread pool. Esta série de posts irá apresentar todos os recursos disponíveis, iniciando os estudos pela TPL, ou Task Parallel Library. Task Parallel Library A TPL é um conjunto de tipos localizados no namespace System.Threading e System.Threading.Tasks, a partir da versão 4 do framework. A partir da versão 4 do framework, o TPL é a maneira recomendada para escrever código paralelo e multithreaded. http://msdn.microsoft.com/en-us/library/dd460717(v=VS.100).aspx Task Parallelism O termo “task parallelism”, ou em uma tradução live paralelismo de tarefas, se refere a uma ou mais tarefas sendo executadas de forma simultanea. Considere uma tarefa como um método. A maneira mais fácil de executar tarefas de forma paralela é o código abaixo: Parallel.Invoke(() => TrabalhoInicial(), () => TrabalhoSeguinte()); O que acontece de verdade? Por trás nos panos, esta instrução instancia de forma implícita objetos do tipo Task, responsável por representar uma operação assíncrona, não exatamente paralela: public class Task : IAsyncResult, IDisposable É possível instanciar Tasks de forma explícita, sendo uma alternativa mais complexa ao Parallel.Invoke. var task = new Task(() => TrabalhoInicial()); task.Start(); Outra opção de instanciar uma Task e já executar sua tarefa é: var t = Task<int>.Factory.StartNew(() => TrabalhoInicialComValor());var t2 = Task<int>.Factory.StartNew(() => TrabalhoSeguinteComValor()); A diferença básica entre as duas abordagens é que a primeira tem início conhecido, mais utilizado quando não queremos que a instanciação e o agendamento da execução ocorra em uma só operação, como na segunda abordagem. Data Parallelism Ainda parte da TPL, o Data Parallelism se refere a cenários onde a mesma operação deva ser executada paralelamente em elementos de uma coleção ou array, através de instruções paralelas For e ForEach. A idéia básica é pegar cada elemento da coleção (ou array) e trabalhar com diversas threads concomitantemente. A classe-chave para este cenário é a System.Threading.Tasks.Parallel // Sequential version foreach (var item in sourceCollection) { Process(item); } // Parallel equivalent Parallel.ForEach(sourceCollection, item => Process(item)); Complicado né? :) Demonstração Acesse aqui um vídeo com exemplos (screencast). Cuidado! Apesar da imensa vontade de sair codificando, tome cuidado com alguns problemas básicos de paralelismo. Neste link é possível conhecer algumas situações. Abraços.

    Read the article

  • Best of "The Moth" 2009

    Not wanting to break the tradition (2004, 2005, 2006, 2007, 2008) below are some blog posts I picked from my blogging last year. As you can see by comparing with the links above, 2009 marks my lowest output yet with only 64 posts, but hopefully the quality has not been lowered ;-) 1. Parallel Computing was a strong focus of course. You can find links to most of that content aggregated in the post where I shared my entire parallelism session. Related to that was the link to the screencast I shared of the Parallel Computing Features Tour.2. Parallel Debugging is obviously part of the parallel computing links above, but I created more in depth content around that area of Visual Studio 2010 since it is the one I directly own. I aggregated all the links to that content in my post: Parallel Debugging.3. High Performance Computing through clusters is an area I'll be focusing more next year (besides parallelism on a single node on the client captured above) and I started introducing the topic on my blog this year. Read the (currently) 6 posts bottom up from my category on HPC.4. Windows 7 Task Manager. In April I shared a screenshot which was the most "borrowed" item from my blog (I should have watermarked it ;-)5. Windows Phone non-support in VS2010. Did my bit to spread clarification of the story.6. Window positions in Visual Studio is a long post, but one that I strongly advise all VS users to read and benefit from.7. Bug Triage gives you a glimpse into one thing all (Microsoft) product teams do.If you haven't yet, you can subscribe via one of the options on the left. Either way, thank you for staying tuned… Happy New Year! Comments about this post welcome at the original blog.

    Read the article

  • Parallelism in .NET – Part 8, PLINQ’s ForAll Method

    - by Reed
    Parallel LINQ extends LINQ to Objects, and is typically very similar.  However, as I previously discussed, there are some differences.  Although the standard way to handle simple Data Parellelism is via Parallel.ForEach, it’s possible to do the same thing via PLINQ. PLINQ adds a new method unavailable in standard LINQ which provides new functionality… LINQ is designed to provide a much simpler way of handling querying, including filtering, ordering, grouping, and many other benefits.  Reading the description in LINQ to Objects on MSDN, it becomes clear that the thinking behind LINQ deals with retrieval of data.  LINQ works by adding a functional programming style on top of .NET, allowing us to express filters in terms of predicate functions, for example. PLINQ is, generally, very similar.  Typically, when using PLINQ, we write declarative statements to filter a dataset or perform an aggregation.  However, PLINQ adds one new method, which provides a very different purpose: ForAll. The ForAll method is defined on ParallelEnumerable, and will work upon any ParallelQuery<T>.  Unlike the sequence operators in LINQ and PLINQ, ForAll is intended to cause side effects.  It does not filter a collection, but rather invokes an action on each element of the collection. At first glance, this seems like a bad idea.  For example, Eric Lippert clearly explained two philosophical objections to providing an IEnumerable<T>.ForEach extension method, one of which still applies when parallelized.  The sole purpose of this method is to cause side effects, and as such, I agree that the ForAll method “violates the functional programming principles that all the other sequence operators are based upon”, in exactly the same manner an IEnumerable<T>.ForEach extension method would violate these principles.  Eric Lippert’s second reason for disliking a ForEach extension method does not necessarily apply to ForAll – replacing ForAll with a call to Parallel.ForEach has the same closure semantics, so there is no loss there. Although ForAll may have philosophical issues, there is a pragmatic reason to include this method.  Without ForAll, we would take a fairly serious performance hit in many situations.  Often, we need to perform some filtering or grouping, then perform an action using the results of our filter.  Using a standard foreach statement to perform our action would avoid this philosophical issue: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action foreach (var item in filteredItems) { // These will now run serially item.DoSomething(); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This would cause a loss in performance, since we lose any parallelism in place, and cause all of our actions to be run serially. We could easily use a Parallel.ForEach instead, which adds parallelism to the actions: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action once the filter completes Parallel.ForEach(filteredItems, item => { // These will now run in parallel item.DoSomething(); }); This is a noticeable improvement, since both our filtering and our actions run parallelized.  However, there is still a large bottleneck in place here.  The problem lies with my comment “perform an action once the filter completes”.  Here, we’re parallelizing the filter, then collecting all of the results, blocking until the filter completes.  Once the filtering of every element is completed, we then repartition the results of the filter, reschedule into multiple threads, and perform the action on each element.  By moving this into two separate statements, we potentially double our parallelization overhead, since we’re forcing the work to be partitioned and scheduled twice as many times. This is where the pragmatism comes into play.  By violating our functional principles, we gain the ability to avoid the overhead and cost of rescheduling the work: // Perform an action on the results of our filter collection .AsParallel() .Where( i => i.SomePredicate() ) .ForAll( i => i.DoSomething() ); The ability to avoid the scheduling overhead is a compelling reason to use ForAll.  This really goes back to one of the key points I discussed in data parallelism: Partition your problem in a way to place the most work possible into each task.  Here, this means leaving the statement attached to the expression, even though it causes side effects and is not standard usage for LINQ. This leads to my one guideline for using ForAll: The ForAll extension method should only be used to process the results of a parallel query, as returned by a PLINQ expression. Any other usage scenario should use Parallel.ForEach, instead.

    Read the article

  • Microsoft Technical Computing

    - by Daniel Moth
    In the past I have described the team I belong to here at Microsoft (Parallel Computing Platform) in terms of contributing to Visual Studio and related products, e.g. .NET Framework. To be more precise, our team is part of the Technical Computing group, which is still part of the Developer Division. This was officially announced externally earlier this month in an exec email (from Bob Muglia, the president of STB, to which DevDiv belongs). Here is an extract: "… As we build the Technical Computing initiative, we will invest in three core areas: 1. Technical computing to the cloud: Microsoft will play a leading role in bringing technical computing power to scientists, engineers and analysts through the cloud. Existing high- performance computing users will benefit from the ability to augment their on-premises systems with cloud resources that enable ‘just-in-time’ processing. This platform will help ensure processing resources are available whenever they are needed—reliably, consistently and quickly. 2. Simplify parallel development: Today, computers are shipping with more processing power than ever, including multiple cores, but most modern software only uses a small amount of the available processing power. Parallel programs are extremely difficult to write, test and trouble shoot. However, a consistent model for parallel programming can help more developers unlock the tremendous power in today’s modern computers and enable a new generation of technical computing. We are delivering new tools to automate and simplify writing software through parallel processing from the desktop… to the cluster… to the cloud. 3. Develop powerful new technical computing tools and applications: We know scientists, engineers and analysts are pushing common tools (i.e., spreadsheets and databases) to the limits with complex, data-intensive models. They need easy access to more computing power and simplified tools to increase the speed of their work. We are building a platform to do this. Our development efforts will yield new, easy-to-use tools and applications that automate data acquisition, modeling, simulation, visualization, workflow and collaboration. This will allow them to spend more time on their work and less time wrestling with complicated technology. …" Our Parallel Computing Platform team is directly responsible for item #2, and we work very closely with the teams delivering items #1 and #3. At the same time as the exec email, our marketing team unveiled a website with interviews that I invite you to check out: Modeling the World. Comments about this post welcome at the original blog.

    Read the article

  • What does the - option to env do?

    - by grifaton
    From the man page for env: The historic - option has been deprecated but is still supported in this implementation. What does the "historic - option" do? In particular, why does it change which version of python is run? ~:$ env python Python 2.6.5 Stackless 3.1b3 060516 (release26-maint, Mar 24 2010, 09:47:07) but: ~:$ env - python Python 2.5.1 (r251:54863, Feb 6 2009, 19:02:12)

    Read the article

  • In Djano, why do I get a 500 server error when browsing, but "python mysite.fcgi" from SSH works fin

    - by Jim
    If I browse to my site, I get a 500 "internal server error." However, if I SSH into my server and go to my site's folder and run "python mysite.fcgi" I see the HTML rendered fine. Obviously, something is wrong, but I'm not sure what. Here is my .htaccess file: AddHandler fastcgi-script .fcgi RewriteEngine On RewriteRule ^(media/.*)$ - [L] RewriteRule ^(static/.*)$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteRule ^(.*)$ mysite.fcgi/$1 [QSA,L] Here is my mysite.fcgi file: #!/usr/bin/python2.5 import sys, os sys.path.insert(0, "/kunden/homepages/34/[mydir]/htdocs/projects/django") sys.path.insert(1, "/kunden/homepages/34/[mydir]/lib/python/site-packages") os.chdir("/kunden/homepages/34/[mydir]/htdocs/projects/django/mysite") os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings' from django.core.servers.fastcgi import runfastcgi runfastcgi(["method=threaded", "daemonize=false"]) I'm setting this up on 1and1. It has been a pain, but I think I'm close.

    Read the article

  • Condensing/abstracting the path shown on each line in PowerShell

    - by kRON
    I've started using virtualenv for my Python projects. Since the working directory for my projects is now deeply nested, the path has started to take up half of my screen! Is it possible to somehow have PowerShell condense the path to the current working directory, from something like C:\Users\kRON\Desktop\Current work\Python\dsp\src to C:\Users\kRON\Deskto~1\Curren~2\Python\dsp\src or, better yet, to match C:\Users\kRON\Desktop\Current work\Python\and just replace it with ~python\?

    Read the article

  • How to Load Oracle Tables From Hadoop Tutorial (Part 5 - Leveraging Parallelism in OSCH)

    - by Bob Hanckel
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Using OSCH: Beyond Hello World In the previous post we discussed a “Hello World” example for OSCH focusing on the mechanics of getting a toy end-to-end example working. In this post we are going to talk about how to make it work for big data loads. We will explain how to optimize an OSCH external table for load, paying particular attention to Oracle’s DOP (degree of parallelism), the number of external table location files we use, and the number of HDFS files that make up the payload. We will provide some rules that serve as best practices when using OSCH. The assumption is that you have read the previous post and have some end to end OSCH external tables working and now you want to ramp up the size of the loads. Using OSCH External Tables for Access and Loading OSCH external tables are no different from any other Oracle external tables.  They can be used to access HDFS content using Oracle SQL: SELECT * FROM my_hdfs_external_table; or use the same SQL access to load a table in Oracle. INSERT INTO my_oracle_table SELECT * FROM my_hdfs_external_table; To speed up the load time, you will want to control the degree of parallelism (i.e. DOP) and add two SQL hints. ALTER SESSION FORCE PARALLEL DML PARALLEL  8; ALTER SESSION FORCE PARALLEL QUERY PARALLEL 8; INSERT /*+ append pq_distribute(my_oracle_table, none) */ INTO my_oracle_table SELECT * FROM my_hdfs_external_table; There are various ways of either hinting at what level of DOP you want to use.  The ALTER SESSION statements above force the issue assuming you (the user of the session) are allowed to assert the DOP (more on that in the next section).  Alternatively you could embed additional parallel hints directly into the INSERT and SELECT clause respectively. /*+ parallel(my_oracle_table,8) *//*+ parallel(my_hdfs_external_table,8) */ Note that the "append" hint lets you load a target table by reserving space above a given "high watermark" in storage and uses Direct Path load.  In other doesn't try to fill blocks that are already allocated and partially filled. It uses unallocated blocks.  It is an optimized way of loading a table without incurring the typical resource overhead associated with run-of-the-mill inserts.  The "pq_distribute" hint in this context unifies the INSERT and SELECT operators to make data flow during a load more efficient. Finally your target Oracle table should be defined with "NOLOGGING" and "PARALLEL" attributes.   The combination of the "NOLOGGING" and use of the "append" hint disables REDO logging, and its overhead.  The "PARALLEL" clause tells Oracle to try to use parallel execution when operating on the target table. Determine Your DOP It might feel natural to build your datasets in Hadoop, then afterwards figure out how to tune the OSCH external table definition, but you should start backwards. You should focus on Oracle database, specifically the DOP you want to use when loading (or accessing) HDFS content using external tables. The DOP in Oracle controls how many PQ slaves are launched in parallel when executing an external table. Typically the DOP is something you want to Oracle to control transparently, but for loading content from Hadoop with OSCH, it's something that you will want to control. Oracle computes the maximum DOP that can be used by an Oracle user. The maximum value that can be assigned is an integer value typically equal to the number of CPUs on your Oracle instances, times the number of cores per CPU, times the number of Oracle instances. For example, suppose you have a RAC environment with 2 Oracle instances. And suppose that each system has 2 CPUs with 32 cores. The maximum DOP would be 128 (i.e. 2*2*32). In point of fact if you are running on a production system, the maximum DOP you are allowed to use will be restricted by the Oracle DBA. This is because using a system maximum DOP can subsume all system resources on Oracle and starve anything else that is executing. Obviously on a production system where resources need to be shared 24x7, this can’t be allowed to happen. The use cases for being able to run OSCH with a maximum DOP are when you have exclusive access to all the resources on an Oracle system. This can be in situations when your are first seeding tables in a new Oracle database, or there is a time where normal activity in the production database can be safely taken off-line for a few hours to free up resources for a big incremental load. Using OSCH on high end machines (specifically Oracle Exadata and Oracle BDA cabled with Infiniband), this mode of operation can load up to 15TB per hour. The bottom line is that you should first figure out what DOP you will be allowed to run with by talking to the DBAs who manage the production system. You then use that number to derive the number of location files, and (optionally) the number of HDFS data files that you want to generate, assuming that is flexible. Rule 1: Find out the maximum DOP you will be allowed to use with OSCH on the target Oracle system Determining the Number of Location Files Let’s assume that the DBA told you that your maximum DOP was 8. You want the number of location files in your external table to be big enough to utilize all 8 PQ slaves, and you want them to represent equally balanced workloads. Remember location files in OSCH are metadata lists of HDFS files and are created using OSCH’s External Table tool. They also represent the workload size given to an individual Oracle PQ slave (i.e. a PQ slave is given one location file to process at a time, and only it will process the contents of the location file.) Rule 2: The size of the workload of a single location file (and the PQ slave that processes it) is the sum of the content size of the HDFS files it lists For example, if a location file lists 5 HDFS files which are each 100GB in size, the workload size for that location file is 500GB. The number of location files that you generate is something you control by providing a number as input to OSCH’s External Table tool. Rule 3: The number of location files chosen should be a small multiple of the DOP Each location file represents one workload for one PQ slave. So the goal is to keep all slaves busy and try to give them equivalent workloads. Obviously if you run with a DOP of 8 but have 5 location files, only five PQ slaves will have something to do and the other three will have nothing to do and will quietly exit. If you run with 9 location files, then the PQ slaves will pick up the first 8 location files, and assuming they have equal work loads, will finish up about the same time. But the first PQ slave to finish its job will then be rescheduled to process the ninth location file, potentially doubling the end to end processing time. So for this DOP using 8, 16, or 32 location files would be a good idea. Determining the Number of HDFS Files Let’s start with the next rule and then explain it: Rule 4: The number of HDFS files should try to be a multiple of the number of location files and try to be relatively the same size In our running example, the DOP is 8. This means that the number of location files should be a small multiple of 8. Remember that each location file represents a list of unique HDFS files to load, and that the sum of the files listed in each location file is a workload for one Oracle PQ slave. The OSCH External Table tool will look in an HDFS directory for a set of HDFS files to load.  It will generate N number of location files (where N is the value you gave to the tool). It will then try to divvy up the HDFS files and do its best to make sure the workload across location files is as balanced as possible. (The tool uses a greedy algorithm that grabs the biggest HDFS file and delegates it to a particular location file. It then looks for the next biggest file and puts in some other location file, and so on). The tools ability to balance is reduced if HDFS file sizes are grossly out of balance or are too few. For example suppose my DOP is 8 and the number of location files is 8. Suppose I have only 8 HDFS files, where one file is 900GB and the others are 100GB. When the tool tries to balance the load it will be forced to put the singleton 900GB into one location file, and put each of the 100GB files in the 7 remaining location files. The load balance skew is 9 to 1. One PQ slave will be working overtime, while the slacker PQ slaves are off enjoying happy hour. If however the total payload (1600 GB) were broken up into smaller HDFS files, the OSCH External Table tool would have an easier time generating a list where each workload for each location file is relatively the same.  Applying Rule 4 above to our DOP of 8, we could divide the workload into160 files that were approximately 10 GB in size.  For this scenario the OSCH External Table tool would populate each location file with 20 HDFS file references, and all location files would have similar workloads (approximately 200GB per location file.) As a rule, when the OSCH External Table tool has to deal with more and smaller files it will be able to create more balanced loads. How small should HDFS files get? Not so small that the HDFS open and close file overhead starts having a substantial impact. For our performance test system (Exadata/BDA with Infiniband), I compared three OSCH loads of 1 TiB. One load had 128 HDFS files living in 64 location files where each HDFS file was about 8GB. I then did the same load with 12800 files where each HDFS file was about 80MB size. The end to end load time was virtually the same. However when I got ridiculously small (i.e. 128000 files at about 8MB per file), it started to make an impact and slow down the load time. What happens if you break rules 3 or 4 above? Nothing draconian, everything will still function. You just won’t be taking full advantage of the generous DOP that was allocated to you by your friendly DBA. The key point of the rules articulated above is this: if you know that HDFS content is ultimately going to be loaded into Oracle using OSCH, it makes sense to chop them up into the right number of files roughly the same size, derived from the DOP that you expect to use for loading. Next Steps So far we have talked about OLH and OSCH as alternative models for loading. That’s not quite the whole story. They can be used together in a way that provides for more efficient OSCH loads and allows one to be more flexible about scheduling on a Hadoop cluster and an Oracle Database to perform load operations. The next lesson will talk about Oracle Data Pump files generated by OLH, and loaded using OSCH. It will also outline the pros and cons of using various load methods.  This will be followed up with a final tutorial lesson focusing on how to optimize OLH and OSCH for use on Oracle's engineered systems: specifically Exadata and the BDA. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

    Read the article

  • How can I select which Java to use?

    - by user6611
    I have installed both OpenJDK 6 and 7. When I run "java somefile" from the command line, OpenJDK 6 is invoked. I do not want to change this default behavior. What command can I use to run my non-default OpenJDK 7 installation instead? (I am used to running "python somefile" to invoke the default Python, "python2.7 somefile" to use Python 2.7 specifically and "python3 somefile" to use Python 3 specifically.)

    Read the article

  • Adding items to a files right click menu in Windows Explorer

    - by Fire Lancer
    What do I need to do to add an item to the right click menu for files with certain file extensions, along with sub menus? An example would be adding items to run Python files (.py, .pyw, .pyc) with a specific version of Python, so the menu for a .py files would look like say: Open 7-Zip > ...7zip stuff Run > Python 2.5 Python 2.6 Python 3.1 Edit > IDLE 2.5 IDLE 2.6 IDLE 3.1 various other items

    Read the article

  • Does "for" in .Net Framework 4.0 execute loops in parallel? Or why is the total not the sum of the p

    - by Shiraz Bhaiji
    I am writing code to performance test a web site. I have the following code: string url = "http://xxxxxx"; System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch(); System.Diagnostics.Stopwatch totalTime = new System.Diagnostics.Stopwatch(); totalTime.Start(); for (int i = 0; i < 10; i++) { stopwatch.Start(); WebRequest request = HttpWebRequest.Create(url); WebResponse webResponse = request.GetResponse(); webResponse.Close(); stopwatch.Stop(); textBox1.Text += "Time Taken " + i.ToString() + " = " + stopwatch.Elapsed.Milliseconds.ToString() + Environment.NewLine; stopwatch.Reset(); } totalTime.Stop(); textBox1.Text += "Total Time Taken = " + totalTime.Elapsed.Milliseconds.ToString() + Environment.NewLine; Which is giving the following result: Time Taken 0 = 88 Time Taken 1 = 161 Time Taken 2 = 218 Time Taken 3 = 417 Time Taken 4 = 236 Time Taken 5 = 217 Time Taken 6 = 217 Time Taken 7 = 218 Time Taken 8 = 409 Time Taken 9 = 48 Total Time Taken = 257 I had expected the total time to be the sum of the individual times. Can anybody see why it is not?

    Read the article

  • How to execute a program on PostBuild event in parallel?

    - by John
    I managed to set the compiler to execute another program when the project is built/ran with the following directive in project options: call program.exe param1 param2 The problem is that the compiler executes "program.exe" and waits for it to terminate and THEN the project executable is ran. What I ask: How to set the compiler to run both executables in paralel without waiting for the one in PostBuild event to terminate? Thanks in advance

    Read the article

  • What are the Ruby equivalent of Python itertools, esp. combinations/permutations/groupby?

    - by Amadeus
    Python's itertools module provides a lots of goodies with respect to processing an iterable/iterator by use of generators. For example, permutations(range(3)) --> 012 021 102 120 201 210 combinations('ABCD', 2) --> AB AC AD BC BD CD [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D What are the equivalent in Ruby? By equivalent, I mean fast and memory efficient (Python's itertools module is written in C).

    Read the article

  • Python4Delphi: Returning a python object in a function. (DelphiWrapper)

    - by Gabriel Fonseca
    I am using python4delphi. ow can I return an object from a wrapped Delphi class function? Code Snippet: I have a simple Delphi Class that i wrapped to Python Script, right? TSimple = Class Private function getvar1:string; Public Published property var1:string read getVar1; function getObj:TSimple; end; ... function TSimple.getVar1:string; begin result:='hello'; end; function TSimple.getObj:TSimple; begin result:=self; end; I made the TPySimple like the demo32 to give class access to the python code. My python module name is test. TPyDado = class(TPyDelphiPersistent) // Constructors & Destructors constructor Create( APythonType : TPythonType ); override; constructor CreateWith( PythonType : TPythonType; args : PPyObject ); override; // Basic services function Repr : PPyObject; override; class function DelphiObjectClass : TClass; override; end; ... { TPyDado } constructor TPyDado.Create(APythonType: TPythonType); begin inherited; // we need to set DelphiObject property DelphiObject := TDado.Create; with TDado(DelphiObject) do begin end; Owned := True; // We own the objects we create end; constructor TPyDado.CreateWith(PythonType: TPythonType; args: PPyObject); begin inherited; with GetPythonEngine, DelphiObject as TDado do begin if PyArg_ParseTuple( args, ':CreateDado' ) = 0 then Exit; end; end; class function TPyDado.DelphiObjectClass: TClass; begin Result := TDado; end; function TPyDado.Repr: PPyObject; begin with GetPythonEngine, DelphiObject as TDado do Result := VariantAsPyObject(Format('',[])); // or Result := PyString_FromString( PAnsiChar(Format('(%d, %d)',[x, y])) ); end; And now the python code: import test a = test.Simple() # try access the property var1 and everything is right print a.var1 # work's, but.. b = a.getObj(); # raise a exception that not find any attributes named getObj. # if the function returns a string for example, it's work.

    Read the article

< Previous Page | 440 441 442 443 444 445 446 447 448 449 450 451  | Next Page >