Search Results

Search found 18542 results on 742 pages for 'nhibernate query'.

Page 541/742 | < Previous Page | 537 538 539 540 541 542 543 544 545 546 547 548 | Next Page >

What is spreadsheet useful for?

- by zvrba

I have been in computer business for 15 years in various roles (sysadmin, developer, researcher), and I have never encountered someone using excel for something more advanced than for formatting tables, or as an ad-hoc database that could have been maintained in a text-file. I had to do heavy data-processing and plotting and for that I used some perl scripts + gnuplot, got tiredof it, and went over to R eventually. 2D spreadsheet just didn't seem well-suited for doing statistical analyses over 5-dimensional datasets (not to mention that it produces UGLY plots). I attempted to use spreadsheet for time-tracking, and found out that I would have better been served by a relational database, so I gave up on using excel for that too. For example, it's important to consistently name tasks, and I needed to find out unique task names in a given column across several sheets (I had one timesheet for each month). How do you make such "query" in a program that essentially evaluates independent cells and has little notion of relations between them? So, what are spreadsheets useful for? Why do they have a bunch of mathematical stuff built into them when, AFAICT, people use them mostly as table formatters or bad substitutes for databases?

Read the article
logparser Message with error codes

- by nsr81

Hi All, Is there anyway to get complete error message using LogParser? When I run the following query: logparser -i:EVT -o:NAT "SELECT TimeGenerated,EventID,Message from System WHERE EventTypeName='Error event'" I get the following output: 2009-09-02 19:35:44 7000 The USB Mass Storage Driver service failed to start due to the following error: %%1058 The full "Message" in EventViewer is: Description: The USB Mass Storage Driver service failed to start due to the following error: The service cannot be started, either because it is disabled or because it has no enabled devices associated with it. How can I obtain complete message using logparser?

Read the article
debian modem problems !!!

- by Raafat

hay there guys ... I'm a new Debian user, it looks like a very good choice 4 me, every thing is stable, free and easy to use. the problem is, I'm using my modem to establish a dial up connection to the internet (ppp) (a very old stupid way I'm forced to use for now), and using the KPPP application to do that, and nothing is working properly for me. it seems like it didn't recognize my modem or something. i already tried to make a few stuff, and now i know my modem is on /dev/tty0, so i made a link for that on /dev/modem, and query the modem using KPPP and it responded with something like: Ati : Ati0: Ati1: ... ... Ati7: with a textBox to fill up in front of each one of thees Atis, and now, when i press connect on kppp, it says modem ready, and that's it. BTW, my modem is MDC AC'97 any suggestions pleas ....

Read the article
Developers are strange

- by DavidWimbush

Why do developers always use the GUI tools in SQL Server? I've always found this irritating and just vaguely assumed it's because they aren't familiar with SQL syntax. But when you think about it it, it's a genuine puzzle. Developers type code all day - really heavy code too like generics, lamda functions and extension methods. They (thankfully) scorn the Visual Studio stuff where you drag a table onto the class and it pastes in lots of code to query the table into a DataSet or something. But when they want to add a column to a table, without fail they dive into the graphical table designer. And half the time the script it generates does horrible things like copy the table to another one with the new column, delete the old table, and rename the new table. Which is fine if your users don't care about uptime. Is ALTER TABLE ADD <column definition> really that hard? I just don't get it.

Read the article
install yum on fedora core 6

- by Thomas

hi, I have installed yum through rpm -ivh yum-3.0-6.noarch.rpm. The result came as [root@02e7709 ~]# rpm -ivh yum-3.0-6.noarch.rpm Preparing... ########################################### [100%] package yum-3.0-6 is already installed I used this To query a RPM package, using the command: [root@02e7709 ~]# rpm -q yum-3.0-6.noarch.rpm Reply as follows: [root@02e7709 ~]# rpm -q yum-3.0-6.noarch.rpm package yum-3.0-6.noarch.rpm is not installed both give different reply. But yum not installed I think. Whats the problem here package yum-3.0-6.noarch.rpm is not installed I used yum install subversion This follows [root@02e7709 ~]# yum install subversion Loading "installonlyn" plugin Setting up Install Process Setting up repositories core 100% |=========================| 1.1 kB 00:00 rpmforge 100% |=========================| 1.1 kB 00:00 Error: Cannot find a valid baseurl for repo: updates What is the error baseurl for repo?

Read the article
NAT for Sprint Nexus S "Portable Wi-Fi hotspot"

- by Jon Rodriguez

I am on a 2010 Macbook Air connected to the web over wifi tethering on my Sprint Nexus S. I want to be able to host a few files using MAMP, but it seems that Sprint is running a NAT. When I query checkip.dyndns.org right now, it returns 68.27.228.75. However, trying to navigate to that IP fails (even though I do have MAMP's Apache running on port 80, as verified via loopback). When I whois 68.27.228.75, it appears to be a Sprint address, with NetName "SPRINTPCS" and OrgName "Sprint Nextel Corporation". So, is there some way I can circumvent Sprint's NAT to allow people to connect to my server that is running on a Nexus S Portable Wi-Fi hotspot?

Read the article
Is Internet routing (BGP) fully automated?

- by Adal

If all the routing tables on the Internet would be erased simultaneously, will the routers be able to rediscover them automatically? I'm having an argument with a colleague who says that the RIPE routing tables are essential, but I remember reading that if the tables disappeared, the BGP protocol will allow routers to rediscover working routes between nodes by querying their neighbors which in turn will query their neighbors until a working route will be detected. Then that route will be used to repopulate the routing tables. After a while, all the routes will be restored (not necessarily the optimal routes). Is that correct?

Read the article
multiple wildcard entries

- by Murali

my client has around 300,000 domains and they just have a wildcard for all of them * A 12.12.12.12 Now they want to create a sub domain that points to a different IP and still have the flexibility of wildcard, something like ww1.* A 24.24.24.24 * A 12.12.12.12 Looks like in BIND, the lower "*" is catch-all and taking over every query and hence ww1 is not working. One of solutions offered by IT folks was to create seperate 300K zones for just "ww1" and leave the "*" wildcard. Are there any other DNS software's that can achieve this task easily? Any other ways to deal?

Read the article
SQL SERVER – Table Variables and Transactions – SQL in Sixty Seconds #007 – Video

- by pinaldave

Today’s SQL in Sixty Seconds video is inspired from my presentation at TechEd India 2012 on Misconception and Resolution. Quite often I have seen people getting confused with certain behavior of the T-SQL. They expect SQL to behave certain way and SQL Server behave differently. This kind of issue often creates confusion and frustration. Sometime I have seen them also confusing it with bug and submitting the bug, where reality is totally different. Similar concept which are going to see today. I have seen quite commonly developer assuming that table various will be rolled back when transaction is rolled back. This sixty seconds video describes that table various are not rolled back when transactions are rolled back. More on Errors: Difference Temp Table and Table Variable – Effect of Transaction Effect of TRANSACTION on Local Variable – After ROLLBACK and After COMMIT Debate – Table Variables vs Temporary Tables – Quiz – Puzzle – 13 of 31 I encourage you to submit your ideas for SQL in Sixty Seconds. We will try to accommodate as many as we can. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Database, Pinal Dave, PostADay, SQL, SQL Authority, SQL in Sixty Seconds, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Video

Read the article
How to boot between OSes from inside each OS? in a Windows/Ubuntu dual boot system

- by TheCompander

My ideal scenario is that there is a script/command to boot into the alternate OS from the current OS you are in, restarting the same OS without running the script/command will return it to the same OS. Currently I have grub setup to remember the last OS booted, using GRUB_DEFAULT=saved and GRUB_SAVEDEFAULT=true, I'd like to keep this option. I have read about the ability to manipulate grub from within Ubuntu to boot into windows, shown in this link. Is there a way to similarly boot into Ubuntu from within Windows? I am primarily connecting to this device remotely and hence my query.

Read the article
Document-oriented vs Column-oriented database fit

- by user1007922

I have a data-intensive application that desperately needs a database make-over. The general data model: There are records with RIDs, grouped together by group IDs (GID). The records have arbitrary data fields, (maybe 5-15) with a few of them mandatory and the rest optional, and thus sparse. The general use model: There are LOTS and LOTS of Writes. Millions to Billions of records are stored. Very often, they are associated with new GIDs, but sometimes, they are associated with existing GIDs. There aren't as many reads, but when they happen, they need to be pretty fast or at least constant speed regardless of the database size. And when the reads happen, it will need to retrieve all the records/RIDs with a certain GID. I don't have a need to search by the record field values. Primarily, I will need to query by the GID and maybe RID. What database implementation should I use? I did some initial research between document-oriented and column-oriented databases and it seems the document-oriented ones are a good fit, model-wise. I could store all the records together under the same document key using the GID. But I don't really have any use for their ability to search the document contents itself. I like the simplicity and scalability of column-oriented databases like Cassandra, but how should I model my data in this paradigm for optimal performance? Should my key be the GID and should I create a column for each record/RID? (there maybe thousands or hundreds of thousands of records in a group/GID). Or should my key be the RID and ensure each row has a column for the GID value? What results in faster writes and reads under this model?

Read the article
Revamped Google Webmaster Tools

With a positive surprise I realized today that Google's Webmaster Tools had some minor overhauling and provide some more details than before. Most obvious are the changes on the dashboard where the Top Search Queries now provide information about impressions and clicktroughs instead of the rankings before. Only the links of the search expressions are missing. It seems that the Top search queries were in the focus of this update. The section is now spiced with detailed graphs about what happened during selectable periods on your site. Well, seems that the Webmaster Tools mimic a stripped-down version of Google Analytics... I was very pleased by the details that are offered when you click on a single query term. Really nice to see the search rankings and your responsible URLs at the same time. Before, you had to put two browser instances side-by-side to achieve this kind of overview. Personally, I like the approach to visualize statistics the way Google or other providers do. It gives you a quick and informative overview, and enables you to dig further into details about peaks and lows on your visits, page impressions or clickthroughs.

Read the article
SQL University: Database testing and refactoring tools and examples

- by Mladen Prajdic

This is a post for a great idea called SQL University started by Jorge Segarra also famously known as SqlChicken on Twitter. It’s a collection of blog posts on different database related topics contributed by several smart people all over the world. So this week is mine and we’ll be talking about database testing and refactoring. In 3 posts we’ll cover: SQLU part 1 - What and why of database testing SQLU part 2 - What and why of database refactoring SQLU part 3 - Database testing and refactoring tools and examples This is the third and last part of the series and in it we’ll take a look at tools we can test and refactor with plus some an example of the both. Tools of the trade First a few thoughts about how to go about testing a database. I'm firmily against any testing tools that go into the database itself or need an extra database. Unit tests for the database and applications using the database should all be in one place using the same technology. By using database specific frameworks we fragment our tests into many places and increase test system complexity. Let’s take a look at some testing tools. 1. NUnit, xUnit, MbUnit All three are .Net testing frameworks meant to unit test .Net application. But we can test databases with them just fine. I use NUnit because I’ve always used it for work and personal projects. One day this might change. So the thing to remember is to be flexible if something better comes along. All three are quite similar and you should be able to switch between them without much problem. 2. TSQLUnit As much as this framework is helpful for the non-C# savvy folks I don’t like it for the reason I stated above. It lives in the database and thus fragments the testing infrastructure. Also it appears that it’s not being actively developed anymore. 3. DbFit I haven’t had the pleasure of trying this tool just yet but it’s on my to-do list. From what I’ve read and heard Gojko Adzic (@gojkoadzic on Twitter) has done a remarkable job with it. 4. Redgate SQL Refactor and Apex SQL Refactor Neither of these refactoring tools are free, however if you have hardcore refactoring planned they are worth while looking into. I’ve only used the Red Gate’s Refactor and was quite impressed with it. 5. Reverting the database state I’ve talked before about ways to revert a database to pre-test state after unit testing. This still holds and I haven’t changed my mind. Also make sure to read the comments as they are quite informative. I especially like the idea of setting up and tearing down the schema for each test group with NHibernate. Testing and refactoring example We’ll take a look at the simple schema and data test for a view and refactoring the SELECT * in that view. We’ll use a single table PhoneNumbers with ID and Phone columns. Then we’ll refactor the Phone column into 3 columns Prefix, Number and Suffix. Lastly we’ll remove the original Phone column. Then we’ll check how the view behaves with tests in NUnit. The comments in code explain the problem so be sure to read them. I’m assuming you know NUnit and C#. T-SQL Code C# test code USE tempdbGOCREATE TABLE PhoneNumbers( ID INT IDENTITY(1,1), Phone VARCHAR(20))GOINSERT INTO PhoneNumbers(Phone)SELECT '111 222333 444' UNION ALLSELECT '555 666777 888'GO-- notice we don't have WITH SCHEMABINDINGCREATE VIEW vPhoneNumbersAS SELECT * FROM PhoneNumbersGO-- Let's take a look at what the view returns -- If we add a new columns and rows both tests will failSELECT *FROM vPhoneNumbers GO -- DoesViewReturnCorrectColumns test will SUCCEED -- DoesViewReturnCorrectData test will SUCCEED -- refactor to split Phone column into 3 partsALTER TABLE PhoneNumbers ADD Prefix VARCHAR(3)ALTER TABLE PhoneNumbers ADD Number VARCHAR(6)ALTER TABLE PhoneNumbers ADD Suffix VARCHAR(3)GO-- update the new columnsUPDATE PhoneNumbers SET Prefix = LEFT(Phone, 3), Number = SUBSTRING(Phone, 5, 6), Suffix = RIGHT(Phone, 3)GO-- remove the old columnALTER TABLE PhoneNumbers DROP COLUMN PhoneGO-- This returns unexpected results!-- it returns 2 columns ID and Phone even though -- we don't have a Phone column anymore.-- Notice that the data is from the Prefix column-- This is a danger of SELECT *SELECT *FROM vPhoneNumbers -- DoesViewReturnCorrectColumns test will SUCCEED -- DoesViewReturnCorrectData test will FAIL -- for a fix we have to call sp_refreshview -- to refresh the view definitionEXEC sp_refreshview 'vPhoneNumbers'-- after the refresh the view returns 4 columns-- this breaks the input/output behavior of the database-- which refactoring MUST NOT doSELECT *FROM vPhoneNumbers -- DoesViewReturnCorrectColumns test will FAIL -- DoesViewReturnCorrectData test will FAIL -- to fix the input/output behavior change problem -- we have to concat the 3 columns into one named PhoneALTER VIEW vPhoneNumbersASSELECT ID, Prefix + ' ' + Number + ' ' + Suffix AS PhoneFROM PhoneNumbersGO-- now it works as expectedSELECT *FROM vPhoneNumbers -- DoesViewReturnCorrectColumns test will SUCCEED -- DoesViewReturnCorrectData test will SUCCEED -- clean upDROP VIEW vPhoneNumbersDROP TABLE PhoneNumbers [Test]public void DoesViewReturnCoorectColumns(){ // conn is a valid SqlConnection to the server's tempdb // note the SET FMTONLY ON with which we return only schema and no data using (SqlCommand cmd = new SqlCommand("SET FMTONLY ON; SELECT * FROM vPhoneNumbers", conn)) { DataTable dt = new DataTable(); dt.Load(cmd.ExecuteReader(CommandBehavior.CloseConnection)); // test returned schema: number of columns, column names and data types Assert.AreEqual(dt.Columns.Count, 2); Assert.AreEqual(dt.Columns[0].Caption, "ID"); Assert.AreEqual(dt.Columns[0].DataType, typeof(int)); Assert.AreEqual(dt.Columns[1].Caption, "Phone"); Assert.AreEqual(dt.Columns[1].DataType, typeof(string)); }} [Test]public void DoesViewReturnCorrectData(){ // conn is a valid SqlConnection to the server's tempdb using (SqlCommand cmd = new SqlCommand("SELECT * FROM vPhoneNumbers", conn)) { DataTable dt = new DataTable(); dt.Load(cmd.ExecuteReader(CommandBehavior.CloseConnection)); // test returned data: number of rows and their values Assert.AreEqual(dt.Rows.Count, 2); Assert.AreEqual(dt.Rows[0]["ID"], 1); Assert.AreEqual(dt.Rows[0]["Phone"], "111 222333 444"); Assert.AreEqual(dt.Rows[1]["ID"], 2); Assert.AreEqual(dt.Rows[1]["Phone"], "555 666777 888"); }} With this simple example we’ve seen how a very simple schema can cause a lot of problems in the whole application/database system if it doesn’t have tests. Imagine what would happen if some outside process would depend on that view. It would get wrong data and propagate it silently throughout the system. And that is not good. So have tests at least for the crucial parts of your systems. And with that we conclude the Database Testing and Refactoring week at SQL University. Hope you learned something new and enjoy the learning weeks to come. Have fun!

Read the article
SEO impact on subdomain for full name and obscure ccTLD

- by Dan Christian

There have been a few questions on subdomains and their impact on SEO, mostly in comparison to subfolders. The closest question I've found is this question but it still doesn't completely answer my query. I'm setting up a blog for 'Sam Smith'. It's imperative the SEO is based around his full name as he is a prominent blogger and his name is his value. All ccTLD variations of 'samsmith' (samsmith.com, samsmith.cc etc) are taken. However there has been the opportunity to register an obscure ccTLD for 'smith'. In regards to SEO value purely from the URL... 1) Will there be any negative SEO implications on searches for 'Sam Smith' when setting up the subdomain as 'sam.smith.' compared to a more regular 'samsmith.' domain? Will a search engine recognise the subdomain as the full name as oppose to just 'smith'? 2) Are there any negative SEO implications with an obscure ccTLD. For instance if Sam Smith was a prominent blogger in Canada with most of his audience based there, would there be any negative SEO if he had, for example, a .co ccTLD.

Read the article
#altnetseattle – CQRS

- by GeekAgilistMercenary

This is a topic I know nothing about, and thus, may be supremely disparate notes. Have fun translating. : ) . . .and coolness that the session is well past capacity. Separates things form the UI and everything that needs populated is done through commands. The domain and reports have separate storage. Events populate these stores of data, such as "sold event". What it looks like, is that the domain controls the requests by event, which would be a product order or something similar. Event sourcing is a key element of the logic. DDD (Domain Driven Design) is part of the core basis for this methodology/structure. The architecture/methodology/structure is perfect for blade style plugin hardware as needed. Good blog entry DDDD: Why I love CQRS and another Command and Query Responsibility Segregation (CQRS), more, CQRS à la Greg Young, a bit by Udi Dahan and there are more. Google, Bing, etc are there for a reason. It appears the core underpinning architectural element of this is the break out of unique identifiable actions, or I suppose better described as events. Those events then act upon specific pipelines such as read requests, write requests, etc. I will be doing more research on this topic and will have something written up shortly. At this time it seems like nothing new, just a large architectural break out of identifiable needs of the entire enterprise system. The reporting is in one segment of the architecture, the domain is in another, hydration broken out to interfaces, and events are executed to incur events on the Reports, or what appears by the description to be events on the domain. Anyway, more to come on this later.

Read the article
Apache disable DNS lookups

- by odeceixe

I'm using Debian 4.3.2-1 and Apache 2 on my production server. Watching the logs, I noticed Apache is resolving client's hostnames even with HostnameLookups Off in apache2.conf. I want to avoid these lookups so I'm guessing Apache is making this DNS query because I have mod_authz_host enabled. When I try to unlink this module, I get several modules complaining because they use the Order directive. How is the clean way to go? Should I comment all Order directives like Order allow,deny Deny from all Is this the only way to stop Apache from making DNS requests? I would like to deny access to .htaccess files and some rules like that.

Read the article
Thinktecture.IdentityModel: WRAP and SWT Support

- by Your DisplayName here!

The latest drop of Thinktecture.IdentityModel contains some helpers for the Web Resource Authorization Protocol (WRAP) and Simple Web Tokens (SWT). WRAP The WrapClient class is a helper to request SWT tokens via WRAP. It supports issuer/key, SWT and SAML input credentials, e.g.: var client = new WrapClient(wrapEp); var swt = client.Issue(issuerName, issuerKey, scope); All Issue overrides return a SimpleWebToken type, which brings me to the next helper class. SWT The SimpleWebToken class wraps a SWT token. It combines a number of features: conversion between string format and CLR type representation creation of SWT tokens validation of SWT token projection of SWT token as IClaimsIdentity helpers to embed SWT token in headers and query strings The following sample code generates a SWT token using the helper class: private static string CreateSwtToken() { var signingKey = "wA…"; var audience = "http://websample"; var issuer = "http://self"; var token = new SimpleWebToken( issuer, audience, Convert.FromBase64String(signingKey)); token.AddClaim(ClaimTypes.Name, "dominick"); token.AddClaim(ClaimTypes.Role, "Users"); token.AddClaim(ClaimTypes.Role, "Administrators"); token.AddClaim("simple", "test"); return token.ToString(); }

Read the article
database replication for new user signup

- by Jeff Storey

I have a database that stores the users of my application. When a new user signs up, a record is inserted into the database for that user. I have a replicated version (slave) of this database (using mysql for now). What I'm concerned about is this scenario: step 1: user signs up and user record is inserted into the database step 2: user then tries to login, and the login process queries the database for the user. however, this query hits the slave database, but the user record has not yet been replicated in the slave and it returns an error that the user does not exist. This is a pretty trivial example, but I can see how it can apply to a lot of cases. Is there a strategy for configuring replicated databases to help prevent this situation?

Read the article
how to pass traffic for port 80 not through openvpn?

- by moti

Is there a way to configure OpenVPN clients to route traffic for HTTP port 80 and HTTPS port 443 directly (i.e. not through the VPN), but through the regular default gateway the clients have. All other traffic should go through the VPN. My client is running OpenVPN on Windows and my current configuration looks like this: client dev tun proto tcp remote my-server-2 1194 resolv-retry infinite nobind persist-key persist-tun ca ../keys/ca.crt cert ../keys/client1.crt key ../keys/client1.key ns-cert-type server verb 3 route-metric 1 show-net-up dhcp-renew dhcp-release route-delay 0 120 hand-window 180 management localhost 13010 management-hold management-query-passwords management-forget-disconnect management-signal auth-user-pass

Read the article
Lucene and .NET Part I

- by javarg

I’ve playing around with Lucene.NET and trying to get a feeling of what was required to develop and implement a full business application using it. As you would imagine, many things are required for you to implement a robust solution for indexing content and searching it afterwards. Lucene is a great and robust solution for indexing content. It offers fast and performance enhanced search engine library available in Java and .NET. You will want to use this library in many particular scenarios: In Windows Azure, to support Full Text Search (a functionality not currently supported by SQL Azure) When storing files outside or not managed by your database (like in large document storage solutions that uses File System) When Full Text Search is not really what you need Lucene is more than a Full Text Search solution. It has several analyzers that let you process and search content in different ways (decomposing sentences, deriving words, removing articles, etc.). When deciding to implement indexing using Lucene, you will need to take into account the following: How content is to be indexed by Lucene and when. Using a service that runs after a specific interval Immediately when content changes When content is to available for searching / Availability of indexed content (as in real time content search) Immediately when content changes = near real time searching After a few minutes.. Ease of maintainability and development Some Technical Concerns.. When indexing content, indexes are locked for writing operations by the Index Writer. This means that Lucene is best designed to index content using single writer approach. When searching, Index Readers take a snapshot of indexes. This has the following implications: Setting up an index reader is a costly task. Your are not supposed to create one for each query or search. A good practice is to create readers and reuse them for several searches. The latter means that even when the content gets updated, you wont be able to see the changes. You will need to recycle the reader. In the second part of this post we will review some alternatives and design considerations.

Read the article
Building Simple Workflows in Oozie

- by dan.mcclary

Introduction More often than not, data doesn't come packaged exactly as we'd like it for analysis. Transformation, match-merge operations, and a host of data munging tasks are usually needed before we can extract insights from our Big Data sources. Few people find data munging exciting, but it has to be done. Once we've suffered that boredom, we should take steps to automate the process. We want codify our work into repeatable units and create workflows which we can leverage over and over again without having to write new code. In this article, we'll look at how to use Oozie to create a workflow for the parallel machine learning task I described on Cloudera's site. Hive Actions: Prepping for Pig In my parallel machine learning article, I use data from the National Climatic Data Center to build weather models on a state-by-state basis. NCDC makes the data freely available as gzipped files of day-over-day observations stretching from the 1930s to today. In reading that post, one might get the impression that the data came in a handy, ready-to-model files with convenient delimiters. The truth of it is that I need to perform some parsing and projection on the dataset before it can be modeled. If I get more observations, I'll want to retrain and test those models, which will require more parsing and projection. This is a good opportunity to start building up a workflow with Oozie. I store the data from the NCDC in HDFS and create an external Hive table partitioned by year. This gives me flexibility of Hive's query language when I want it, but let's me put the dataset in a directory of my choosing in case I want to treat the same data with Pig or MapReduce code. CREATE EXTERNAL TABLE IF NOT EXISTS historic_weather(column 1, column2) PARTITIONED BY (yr string) STORED AS ... LOCATION '/user/oracle/weather/historic'; As new weather data comes in from NCDC, I'll need to add partitions to my table. That's an action I should put in the workflow. Similarly, the weather data requires parsing in order to be useful as a set of columns. Because of their long history, the weather data is broken up into fields of specific byte lengths: x bytes for the station ID, y bytes for the dew point, and so on. The delimiting is consistent from year to year, so writing SerDe or a parser for transformation is simple. Once that's done, I want to select columns on which to train, classify certain features, and place the training data in an HDFS directory for my Pig script to access. ALTER TABLE historic_weather ADD IF NOT EXISTS PARTITION (yr='2010') LOCATION '/user/oracle/weather/historic/yr=2011'; INSERT OVERWRITE DIRECTORY '/user/oracle/weather/cleaned_history' SELECT w.stn, w.wban, w.weather_year, w.weather_month, w.weather_day, w.temp, w.dewp, w.weather FROM ( FROM historic_weather SELECT TRANSFORM(...) USING '/path/to/hive/filters/ncdc_parser.py' as stn, wban, weather_year, weather_month, weather_day, temp, dewp, weather ) w; Since I'm going to prepare training directories with at least the same frequency that I add partitions, I should also add that to my workflow. Oozie is going to invoke these Hive actions using what's somewhat obviously referred to as a Hive action. Hive actions amount to Oozie running a script file containing our query language statements, so we can place them in a file called weather_train.hql. Starting Our Workflow Oozie offers two types of jobs: workflows and coordinator jobs. Workflows are straightforward: they define a set of actions to perform as a sequence or directed acyclic graph. Coordinator jobs can take all the same actions of Workflow jobs, but they can be automatically started either periodically or when new data arrives in a specified location. To keep things simple we'll make a workflow job; coordinator jobs simply require another XML file for scheduling. The bare minimum for workflow XML defines a name, a starting point, and an end point: <workflow-app name="WeatherMan" xmlns="uri:oozie:workflow:0.1"> <start to="ParseNCDCData"/> <end name="end"/> </workflow-app> To this we need to add an action, and within that we'll specify the hive parameters Also, keep in mind that actions require <ok> and <error> tags to direct the next action on success or failure. <action name="ParseNCDCData"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <configuration> <property> <name>oozie.hive.defaults</name> <value>/user/oracle/weather_ooze/hive-default.xml</value> </property> </configuration> <script>ncdc_parse.hql</script> </hive> <ok to="WeatherMan"/> <error to="end"/> </action> There are a couple of things to note here: I have to give the FQDN (or IP) and port of my JobTracker and NameNode. I have to include a hive-default.xml file. I have to include a script file. The hive-default.xml and script file must be stored in HDFS That last point is particularly important. Oozie doesn't make assumptions about where a given workflow is being run. You might submit workflows against different clusters, or have different hive-defaults.xml on different clusters (e.g. MySQL or Postgres-backed metastores). A quick way to ensure that all the assets end up in the right place in HDFS is just to make a working directory locally, build your workflow.xml in it, and copy the assets you'll need to it as you add actions to workflow.xml. At this point, our local directory should contain: workflow.xml hive-defaults.xml (make sure this file contains your metastore connection data) ncdc_parse.hql Adding Pig to the Ooze Adding our Pig script as an action is slightly simpler from an XML standpoint. All we do is add an action to workflow.xml as follows: <action name="WeatherMan"> <pig> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <script>weather_train.pig</script> </pig> <ok to="end"/> <error to="end"/> </action> Once we've done this, we'll copy weather_train.pig to our working directory. However, there's a bit of a "gotcha" here. My pig script registers the Weka Jar and a chunk of jython. If those aren't also in HDFS, our action will fail from the outset -- but where do we put them? The Jython script goes into the working directory at the same level as the pig script, because pig attempts to load Jython files in the directory from which the script executes. However, that's not where our Weka jar goes. While Oozie doesn't assume much, it does make an assumption about the Pig classpath. Anything under working_directory/lib gets automatically added to the Pig classpath and no longer requires a REGISTER statement in the script. Anything that uses a REGISTER statement cannot be in the working_directory/lib directory. Instead, it needs to be in a different HDFS directory and attached to the pig action with an <archive> tag. Yes, that's as confusing as you think it is. You can get the exact rules for adding Jars to the distributed cache from Oozie's Pig Cookbook. Making the Workflow Work We've got a workflow defined and have collected all the components we'll need to run. But we can't run anything yet, because we still have to define some properties about the job and submit it to Oozie. We need to start with the job properties, as this is essentially the "request" we'll submit to the Oozie server. In the same working directory, we'll make a file called job.properties as follows: nameNode=hdfs://localhost:8020 jobTracker=localhost:8021 queueName=default weatherRoot=weather_ooze mapreduce.jobtracker.kerberos.principal=foo dfs.namenode.kerberos.principal=foo oozie.libpath=${nameNode}/user/oozie/share/lib oozie.wf.application.path=${nameNode}/user/${user.name}/${weatherRoot} outputDir=weather-ooze While some of the pieces of the properties file are familiar (e.g., JobTracker address), others take a bit of explaining. The first is weatherRoot: this is essentially an environment variable for the script (as are jobTracker and queueName). We're simply using them to simplify the directives for the Oozie job. The oozie.libpath pieces is extremely important. This is a directory in HDFS which holds Oozie's shared libraries: a collection of Jars necessary for invoking Hive, Pig, and other actions. It's a good idea to make sure this has been installed and copied up to HDFS. The last two lines are straightforward: run the application defined by workflow.xml at the application path listed and write the output to the output directory. We're finally ready to submit our job! After all that work we only need to do a few more things: Validate our workflow.xml Copy our working directory to HDFS Submit our job to the Oozie server Run our workflow Let's do them in order. First validate the workflow: oozie validate workflow.xml Next, copy the working directory up to HDFS: hadoop fs -put working_dir /user/oracle/working_dir Now we submit the job to the Oozie server. We need to ensure that we've got the correct URL for the Oozie server, and we need to specify our job.properties file as an argument. oozie job -oozie http://url.to.oozie.server:port_number/ -config /path/to/working_dir/job.properties -submit We've submitted the job, but we don't see any activity on the JobTracker? All I got was this funny bit of output: 14-20120525161321-oozie-oracle This is because submitting a job to Oozie creates an entry for the job and places it in PREP status. What we got back, in essence, is a ticket for our workflow to ride the Oozie train. We're responsible for redeeming our ticket and running the job. oozie -oozie http://url.to.oozie.server:port_number/ -start 14-20120525161321-oozie-oracle Of course, if we really want to run the job from the outset, we can change the "-submit" argument above to "-run." This will prep and run the workflow immediately. Takeaway So, there you have it: the somewhat laborious process of building an Oozie workflow. It's a bit tedious the first time out, but it does present a pair of real benefits to those of us who spend a great deal of time data munging. First, when new data arrives that requires the same processing, we already have the workflow defined and ready to run. Second, as we build up a set of useful action definitions over time, creating new workflows becomes quicker and quicker.

Read the article
Simple recursive DNS resolver for debugging (app or VM)

- by notpeter

I have an issue which I believe is caused by incorrect DNS queries (doubled subdomains like _record.host.subdomain.tld.subdomain.tld) when querying for SRV records. So I need to an alternate DNS server with heavy logging so I can see every query (especially stupid ones), acting as a recursive resolver with the ability create records which override real DNS records so I can not only find the records it's (wrongly) looking for, but populate those records as well. I know I could install a DNS server on yet another linux box, but I feel like this is the sort of thing that someone may already setup a simple python script or single use vm just for this purpose.

Read the article
Take Control of Workflow with Workflow Analyzer!

- by user793553

Take Control of Workflow with Workflow Analyzer! Immediate Analysis and Output of your EBS Workflow Environment The EBS Workflow Analyzer is a script that reviews the current Workflow Footprint, analyzes the configurations, environment, providing feedback, and recommendations on Best Practices and areas of concern. Go to Doc ID 1369938.1 for more details and script download with a short overview video on it. Proactive Benefits: Immediate Analysis and Output of Workflow Environment Identifies Aged Records Identifies Workflow Errors & Volumes Identifies looping Workflow items and stuck activities Identifies Workflow System Setup and configurations Identifies and Recommends Workflow Best Practices Easy To Add Tool for regular Workflow Maintenance Execute Analysis anytime to compare trending from past outputs The Workflow Analyzer presents key details in an easy to review graphical manner. See the examples below. Workflow Runtime Data Table Gauge The Workflow Runtime Data Table Gauge will show critical (red), bad (yellow) and good (green) depending on the number of workflow items (WF_ITEMS). Workflow Error Notifications Pie Chart A pie chart shows the workflow error notification types. Workflow Runtime Table Footprint Bar Chart A pie chart shows the workflow error notification types and a bar chart shows the workflow runtime table footprint. The analyzer also gives detailed listings of setups and configurations. As an example the workflow services are listed along with their status for review: The analyzer draws attention to key details with yellow and red boxes highlighting areas of review: You can extend on any query by reviewing the SQL Script and then running it on your own or making modifications for your own needs: Find more details in these notes: Doc ID 1369938.1 Workflow Analyzer script for E-Business Suite Worklfow Monitoring and Maintenance Doc ID 1425053.1 How to run EBS Workflow Analyzer Tool as a Concurrent Request Or visit the My Oracle Support EBS - Core Workflow Community

Read the article
An XML file or Database?

- by webnoob

I am re-writing a section of my site and am trying to decide how much of a rewrite this will be. At the moment I have a web service feed that generates an xml once per day. I then use this xml file on my website to generate the general structure. I am trying to decide if this information should be located in the database or stay in the xml file. The file can range from 4mb - 12mb. The files depth can go on and on so I have to recurse to find the data I want. I use the .NET serializer classes and store the serialized file in a global variable to avoid re-serializing it each time the page is loaded. My reasons for thinking a database would be better are: I would know exactly where I am in the file by using an internal ID so I wouldn't have to recurse the file to get information. I wouldn't have to load / serialize the XML and could just use my already open database connections. Searching for the data in the file would be quicker(?) as I would just perform an SQL query rather than re-cursing the file. Has anyone got any ideas which is better and which option uses more resources on the server or be quicker? EDIT: The file is read every time the web page is loaded (although only serialized once). It isn't written to by standard users (only by an admin task that runs in the middle of the night). This is my initial investigation before mocking up.

Read the article
DNS Server Spoofed Request Amplification DDoS - Prevention

- by Shackrock

I've been conducting security scans, and a new one popped up for me: DNS Server Spoofed Request Amplification DDoS The remote DNS server answers to any request. It is possible to query the name servers (NS) of the root zone ('.') and get an answer which is bigger than the original request. By spoofing the source IP address, a remote attacker can leverage this 'amplification' to launch a denial of service attack against a third-party host using the remote DNS server. General Solution: Restrict access to your DNS server from public network or reconfigure it to reject such queries. I'm hosting my own DNS for my website. I'm not sure what the solution is here... I'm really looking for some concrete detailed steps to patch this, but haven't found any yet. Any ideas? CentOS5 with WHM and CPanel. Also see: http://securitytnt.com/dns-amplification-attack/

Read the article

< Previous Page | 537 538 539 540 541 542 543 544 545 546 547 548 | Next Page >