html content extraction - Page 250

Why does a webpage miss formatting sometimes?

- by eSKay

Sometimes, a webpage gets loaded in the browser but it is not displayed properly. All the elements of the page are there, but they are not there where they should be. for example (A,B and C are three elements of the page) ----------------------- | | | | | A | B | C | | | | | ----------------------- may be displayed as --------- | | | A | | | --------- | | | B | | | --------- | | | C | | | --------- i.e. the formatting is missing. How does that happen?

Read the article

Lucene and .NET Part I

- by javarg

I’ve playing around with Lucene.NET and trying to get a feeling of what was required to develop and implement a full business application using it. As you would imagine, many things are required for you to implement a robust solution for indexing content and searching it afterwards. Lucene is a great and robust solution for indexing content. It offers fast and performance enhanced search engine library available in Java and .NET. You will want to use this library in many particular scenarios: In Windows Azure, to support Full Text Search (a functionality not currently supported by SQL Azure) When storing files outside or not managed by your database (like in large document storage solutions that uses File System) When Full Text Search is not really what you need Lucene is more than a Full Text Search solution. It has several analyzers that let you process and search content in different ways (decomposing sentences, deriving words, removing articles, etc.). When deciding to implement indexing using Lucene, you will need to take into account the following: How content is to be indexed by Lucene and when. Using a service that runs after a specific interval Immediately when content changes When content is to available for searching / Availability of indexed content (as in real time content search) Immediately when content changes = near real time searching After a few minutes.. Ease of maintainability and development Some Technical Concerns.. When indexing content, indexes are locked for writing operations by the Index Writer. This means that Lucene is best designed to index content using single writer approach. When searching, Index Readers take a snapshot of indexes. This has the following implications: Setting up an index reader is a costly task. Your are not supposed to create one for each query or search. A good practice is to create readers and reuse them for several searches. The latter means that even when the content gets updated, you wont be able to see the changes. You will need to recycle the reader. In the second part of this post we will review some alternatives and design considerations.

Read the article

Can I send a postcode direct to google maps in the url line and return local hotels

- by Mick

Hi I have a form on my web page that includes a postcode. I want to open a new window and display google maps with the data from my postcode form and to reference hotels in the area Ideally something like www.maps.google.co.uk%postcode=pr87uu&hotels Is this something I can do easily ? Any help would be greatly appreciated Thanks Mick

Read the article

Why does modx-based site start using different domains for some content?

- by naxa

situation I have a modx site on a VPS with multiple domain and subdomain names. The modx site should use what I call the 'primary' domain name's 'primary' subdomain, ie www.intendedname.tld . The problem is that as time pass, the site mysteriously starts using another subdomain for links to content like videos, images, and even pages and (internal) links. The other subdomains doesn't serve this content of course. If I clear the modx cache, the original state is restored. However, the problem comes back again later. The VPS has a domain registered and multiple A records pointing to the VPS's IP, as subdomains. There is the 'primary' whan which is intended to be used as the public content server, the other ones are like docs. and test., etc. On top of that, I have dynamic-dns service client installed from no-ip on the machine and a dynamic domain-name bound. It gives a completely different domain name. I originally used it for ssh login and to serve a completely different site. An nginx server is put into good use to do rewrite the different subdomains to the right places. edit The modx templates use Templates use <base href="[[++site_url]]" />. current attempt to fix The current 'solution' to the problem is to also use the rewrite to rewrite everything to the 'primary' domain and subdomain. In the nginx config file for the site, it utilizes (unsurprisingly) the rewrite directive to rewrite the unexpected server_name entries (ie. the other subdomains) in a server block dedicated to this task. So with this, the main site basically works (sort of) but this renders all the other functions (docs) useless. Before this rewrite was set, the 'solution' was to clear the modx cache on a regular basis. The original modx content is not getting corrupted, only the files in cache are. What can I do to find out what actual the problem is and fix it?

Read the article

How to find out what is installed on a server

- by masfenix

Hi, im an undergrad student and we have special access to a server. The server is a unix server (dont know the OS or anything.. is there a way to find out? ). we also get a website associated with the server. There is a public_html folder. I uploaded with a test file with phpinfo() but the server didnt parse it (yes i had the right extension) . So im guessing PHP isnt installed. is there a way to see what "common" languages ARE installed?

Read the article

Mysteriously empty $_POST array

- by Lex

Hi! I have this question running on StackOverflow.com. But it turns out that it may be an related to a setting on the server, so I wanted to ask you too. Please follow the link to SO, and help me. Thanks a lot!

Read the article

How to protect a peer-to-peer network from inappropriate content?

- by Mike

I’m developing a simple peer-to-peer app in .Net which should enable users to share specific content (text and picture files). As I've learned with my last question, inappropriate content can “relatively” easily be identified / controlled in a centralized environment. But what about a peer-to-peer network, what are the best methods to protect a decentralized system from unwanted (illegal) content? At the moment I only see the following two methods: A protocol (a set of rules) defines what kind of data (e.g. only .txt and jpg-files, not bigger than 20KB etc.) can be shared over the p2p-network and all clients (peers) must implement this protocol. If a peer doesn’t, it gets blocked by other peers. Pro: easy to implement. Con: It’s not possible to define the perfect protocol (I think eMail-Spam filters have the same problem) Some kind of rating/reputation system must be implemented (similar to stackoverflow), so “bad guys” and inappropriate content can be identified / blocked by other users. Pro: Would be very accurate. Con: Would be slow and in my view technically very hard to implement. Are there other/better solutions? Any answer or comment is highly appreciated.

Read the article

Nginx to act as both a webserver and for file transfer

- by Simon Naude

I would like to use Nginx as a webserver on my Ubuntu 12.04 server, but i would also like to use it for file transfers. I have been able to set it up as a webserver (very simple), and I have been able to set it up for file transfers (using autoindex on line), but i have not been able to do them both at the same time. Is it possible to have Nginx act as a webserver, and then when you click a link it shows your file directories instead?

Read the article

How do I get Google to crawl my content when it's only displayed when you fill in a form?

- by Sarang Patil

I have a webpage. It has a form and the "results" section is blank. When the user searches for items, and a list that pops up, he/she chooses one option from list and then the corresponding results are displayed in results section. I once decided to log every ip,url of person with time that visits my page. One ip was 66.249.73.26, and on doing google search I came to know it is ip of google bot. link for whatmyipaddress google bot Now when I searched for the links that this ip visited, it was like this: search?id=100 search?id=110 ... search?id=200 ... then afterwards it incremented in steps of 1, like 400,401.. But people search for strings and not numbers. And because googlebot searches for numbers like this, I think the corresponding content is never displayed and so my page content is never indexed, even though it has rich content. So I want to ask you is that in order to show google bot all the content that the webpage has, should I list all the results in index page and ask users to enter string to filter results?

Read the article

how to change document root to public_html from root directory

- by manish

For testing I hosted my website on free server from 000webhost.com They have a directory structure:- (root folder) \ (public folder) \public_html this directory structure enables to keep all the library files in root folder and all public data in \public_html, so I developed my website accordingly, and my final structure looked like:- / /include(this folder contains library files) /logs(log files) /public_html /public_html/index.php /public_html/home.php /public_html/and other public files on 000webhost makes only public_folder available to be accessed via url and my url looked neat and clean like www.xample.com/index.php or www.example.com/home.php but after completion of development I moved website to shared host purchased from go-daddy.com, now they do not have any such kind of directory permission, all the files are kept in root folder and are accessible via url also url has become like:- www.example.com/public_html/home.php or www.example.com/public_html/index.php How should I redirect url request to public_html folder again so as to make library file unavailable to public access and make url neat and clean.

Read the article

What's the difference between www & public_html folder?

- by user39151

What's the difference between the www & public_html folder on the same server on my shared linux hosting. I've seen data is same on both of these folders. Is one of them just a redirect or is the data getting replicated. What is the purpose of 2 folders or even a redirect folder? Thanks in advance! :)

Read the article

Empty $_POST data

- by Antimony

I am trying to post a post to my MyBB server from a Python script, but try as I might, I can't get it to work. The request shows up in the forensic log and the headers are in the $_SERVER variable, but $_POST is always an empty array. The error log shows nothing, even at the debug level. I've already tried searching, but I haven't found anything that's helped. I already checked the post_max_size thing, which is 8M. Another factor is that it's just my own requests which aren't going through. Browser generated requests seem to do just fine. I've looked and looked, but I can't find anything I'm doing differently that should matter. Anyway, here is an example request. POST /newreply.php?tid=1&processed=1 HTTP/1.1 Host: <redacted> Accept-Encoding: identity Content-Length: 1153 Content-Type: multipart/form-data; boundary=-->0xa216654L Cookie: sid=<redacted>; mybb[lastvisit]=1354995469; mybb[lastactive]=1354995500; mybb[threadread]=a%3A1%3A%7Bi%3A1%3Bi%3A1354995469%3B%7D; mybb[forumread]=a%3A1%3A%7Bi%3A2%3Bi%3A1354995469%3B%7D; loginattempts=1; mybbuser=2_ZlVVfaYS9FstZGQzr4KiNRUm3Z4xAgJkTPPq2ouFcuaragOTVQ Accept: text/html User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:14.0) Gecko/20100101 Firefox/14.0.1 -->0xa216654L Content-Disposition: form-data; name="my_post_key" 257b2bbef4334000d9088169154900a3 -->0xa216654L Content-Disposition: form-data; name="quoted_ids" -->0xa216654L Content-Disposition: form-data; name="tid" 1 -->0xa216654L Content-Disposition: form-data; name="message" foo!2 -->0xa216654L Content-Disposition: form-data; name="attachmentact" -->0xa216654L Content-Disposition: form-data; name="attachmentaid" -->0xa216654L Content-Disposition: form-data; name="icon" -1 -->0xa216654L Content-Disposition: form-data; name="posthash" e93a2c78ce3f6807a86fd475ef4178cf -->0xa216654L Content-Disposition: form-data; name="postoptions[subscriptionmethod]" -->0xa216654L Content-Disposition: form-data; name="replyto" -->0xa216654L Content-Disposition: form-data; name="message_new" foo!2 -->0xa216654L Content-Disposition: form-data; name="submit" Post Reply -->0xa216654L Content-Disposition: form-data; name="attachment"; filename="" Content-Type: application/octet-stream -->0xa216654L Content-Disposition: form-data; name="action" do_newreply -->0xa216654L Content-Disposition: form-data; name="subject" Lol -->0xa216654L

Read the article

What is a DART Iframe?

- by Chas. Owens

This Google page refers to a "DART Iframe". Does DART stand for something? I seem to be missing some crucial piece of cultural understanding.

Read the article

Unable to convert file to UTF-8

- by antoniocs

I am on windows xp sp3 and I am trying to convert a file from ASCII to UTF-8. I use notepad++ to do this. I go to Encoding Convert to UTF-8 without BOM. I save the file, reopen and it is still on ASCII. I am using this file in a webpage and I need the file to be UTF-8, because I have strings in utf-8 and they am seeing little squares with ? on them.

Read the article

How to organise storage for media content such as video and music?

- by thor

Currently, we have a single server hosting all content: music, video and software. This content is downloaded by users through HTTP. Now free space is coming to an end and we are exploring different ways of extending our storage capacity. We want to do it cheap, simple and reliable (protected from disk/ server faults). Currenly, we see two ways: Add a couple of cheap servers with 4 disks (RAID1 ?), run some distributed file-system on top, like GlusterFS. Pros: hopefully, we will see all our disks as single flat file system, just dump content into it and be done. Cons: could be tricky in configuration and handling of faults. Add a couple of cheap servers, all running HTTP servers. Each piece of content (be it a music file or video) is placed on randomly selected two servers. Pros: don't have to deal with RAID, as content is duplicated; single server failure does not bring down any part of content; doubled distribution capacity (as any signle file could be downloaded from any of two servers hosting it). Cons: requires some scripting on part of distribution of content, adding/ removing servers. Do we miss any other ways? Which of the aforementioned options seems to be the best?

Read the article

How to display "Comic Sans MS" in Linux?

- by Roman

I use "Comic Sans MS" font on my web page. The web page looks OK if I open it under Windows and MAC. But it does not work under Linux. How can I solve this problem? May be I can put the font on my web server? Is this font available for free? Can it slow down my page? Or may be I can replace "Comic Sans MS" by another font which is similar and is available on the 3 operation systems?

Read the article

Selectively allow unsafe html tags in Plone

- by dhill

I'm searching for a way to put widgets from several services (PicasaWeb, Yahoo Pipes, Delicious bookmarks, etc.) on the community site I host on Plone (currently 3.2.1). I'm looking for a way to allow a group of users to use dangerous html tags. There are some ways I see, but I don't know how to implement those. One would be changing safe_html for the pages editors own (1). Another would be to allow those tags on some subtree (2). And yet another finding an equivalent of "static text portlet" that would display in the middle panel (3). We could then use some of the composite products (I stumbled upon Collage and CMFContentPanels), to include the unsafe content on other sites. My site has been ridden by advert bots, so I don't want to remove the filtering all together. I don't have an easy (no false positives) way of checking which users are bots, so deploying captcha now wouldn't help either. The question is: How to implement any of those solutions? (I already asked that on plone mailing list without an answer, so I thought I would give it another try here.)

Read the article

How to display "Comic Sans MS" in Linux?

- by Roman

I use "Comic Sans MS" font on my web page. The web page looks OK if I open it under Windows and MAC. But it does not work under Linux. How can I solve this problem? May be I can put the font on my web server? Is this font available for free? Can it slow down my page? Or may be I can replace "Comic Sans MS" by another font which is similar and is available on the 3 operation systems?

Read the article

how do I set up a virtual host (it's not working, and I've done everything right)

- by piratepartypumpkin

My router redirects port 80 to port 8080. My router works fine and my domain name is routed properly. This is my virtual hosts file: NameVirtualHost *:80 <VirtualHost *:80> DocumentRoot /home/admins/lampstack-5.3.16-0/apps/wordpress ServerName example.com ServerAlias www.example.com </VirtualHost> I can access my website by entering "mywebsite.com:8080" but I cannot access it by entering "mywebsite.com" For further information, this is a part of my httpd.conf: Listen 8080 Servername localhost:8080 DocumentRoot "/home/admins/lampstack-5.3.16-0/apache2/htdocs <Directory /> Options FollowSymLinks AllowOverride None Order deny, allow deny from all </Directory> <Directory "/home/admins/lampstack-5.3.16-0/apache2/htdocs"> Options FollowSymLinks AllowOverride None Order allow, deny allow from all </Directory>

Read the article

AdBlock Plus Advanced Element Hiding?

- by funkafied

I'm trying to block a certain element on a site using AdBlock Plus's element hiding feature. However the problem is that there are two elements with the same exact name and type that I'm trying to hide so there's no way to tell the filter which one to keep and which one not to keep. So I figure there might be a way to hide only the second element by telling it to only hide the second occurrence of an element that matches the filter. Like skip the first one and hide the second occurrence. Or alternatively maybe hide the one that also has a certain other element in front of it. Is there any way to do this? Like regular expressions or something?

Read the article

How can I pass an external instance to the constructor of an object that's being created using the default XNA XML content loader?

- by Michael

I'm trying to understand how to use the XNA XML content importer to instantiate non-trivial objects that are more than a collection of basic properties (e.g., a class that inherits from DrawableGameObject or GameObject and requires other things to be passed into its constructor). Is it possible to pass existing external instances (e.g., an instance of the current Game) to the constructor of an object that's being created using the default XNA XML content loader? For example, imagine that I have the following class, inheriting from DrawableGameComponent: public class Character : DrawableGameComponent { public string Name { get; set; } public Character(Game game) : base(game) { } public override void Update(GameTime gameTime) { } public override void Draw(GameTime gameTime) { } } If I had a simple class that did not need other parameters in its constructor (i.e., the Game instance), then I could simply use this XML: <XnaContent> <Asset Type="MyNamespace.Character"> <Name>John Doe</Name> </Asset> </XnaContent> ...and then create an instance of Character using this code: var character = Content.Load<Character>("MyXmlAssetName"); But that won't work because I need to pass the need to pass the Game into the constructor. What's the best way to handle this situation? Is there a way to pass in things like the current Game using the default XNA XML content loader? Do I need to write my own XML loader? (If so, how?) Is there a better object-oriented design that I should be using for my classes? Note: Although I used Game in this example, I'm really just asking how to pass any type of existing instance to my constructors. (For example, I'm using the Farseer Physics Engine, and some of my classes also need a reference to the Farseer World object too.) Thanks in advance.

Read the article

I want a hyperlink to open a browser tab, then all subsequent link clicks go to the same tab

- by rossmcm

I suspect I'm out of luck on this one, but here goes... Say I have a CHM help file that has http:// hyperlinks embedded in the help pages. When the user clicks on a hyperlink of the style: <a href="http://www.example.com" target="_blank">click here!</a> a browser window is opened and the target web page is displayed. If a browser is already open a new tab is created and the target displayed in that. If the user clicks on another link (or the same link) another browser window/tab opens, and so on. Is there any way I can force all clicks of the links to go to the same tab/browser window?

Read the article

php registration form - limit emails [closed]

- by Daniel

i want to restrict certain emails to my website. an example would be that i only want people with gmail accounts to register to my website. { /* Check if valid email address */ $regex = "^[_+a-z0-9-]+(\.[_+a-z0-9-]+)*" ."@[a-z0-9-]+(\.[a-z0-9-]{1,})*" ."\.([a-z]{2,}){1}$"; if(!eregi($regex,$subemail)){ $form->setError($field, "* Email invalid"); } $subemail = stripslashes($subemail); } this is what i have so far to check if its a valid email.

Read the article

Trouble opening my router to my web server

- by Justin Heather Barrios

Here's the story. I have a webs server created and connected to my router. The website works great when I'm connected to the router, but when I'm off the network I can't access the website. I got the IP for my router by googling "what is my ip." I have opened ports 80 to 10080 to link to the server in the router. One odd thing that I don't understand. When I am in network if I access XXX.XXX.XX.XX:80 I can access the web page no problem. If I access XXX.XXX.XX.XX:81 (or any other port) I get the error "Cannot access server." Any idea what the problem could be? Could it be my ISP?

Read the article

Getting link to abstract indexed in Google Scholar

- by JordanReiter

We have a large digital library with thousands of papers indexed in Google Scholar. We allow Google Scholar to index our PDFs but they're blocked unless you have a subscription. So Google has full-text indexing/searching of our PDFs (great!) but then the links point just to those PDFs (boo!) instead of the more helpful abstract pages. Does anyone know what could cause an issue like this? I am, to the best of my knowledge, following all of the guidelines laid out in their Inclusion Guidelines. Here's some example meta data: <meta name="citation_title" content="Sample Title"/> <meta name="citation_author" content="LastName, FirstName"/> <meta name="citation_publication_date" content="2012/06/26"/> <meta name="citation_volume" content="1"/> <meta name="citation_issue" content="1"/> <meta name="citation_firstpage" content="10"/> <meta name="citation_lastpage" content="20"/> <meta name="citation_conference_title" content="Name of the Conference"/> <meta name="citation_isbn" content="1-234567-89-X"/> <meta name="citation_pdf_url" content="http://www.example.org/p/1234/proceeding_1234.pdf"/> <meta name="citation_fulltext_html_url" content="http://www.example.org/f/1234/"/> <meta name="citation_abstract_html_url" content="http://www.example.org/p/1234/"/> <link rel="canonical" href="http://www.example.org/p/1234/" /> example.org/p/1234 is the abstract page for the article; example.org/f/1234 is the fulltext link accessible to subscribers only (and to Google Scholar). example.org/p/1234/proceeding_1234.pdf is the fulltext PDF link.

Search Results

Search found 45245 results on 1810 pages for 'html content extraction'.

Page 250/1810 | < Previous Page | 246 247 248 249 250 251 252 253 254 255 256 257 | Next Page >

- by eSKay

- by javarg

- by Mick

- by naxa

- by masfenix

- by Lex

- by Mike

- by Simon Naude

- by Sarang Patil

- by manish

- by user39151

- by Antimony

- by Chas. Owens

- by antoniocs

- by thor

- by Roman

- by dhill

- by Roman

- by piratepartypumpkin

- by funkafied

- by Michael

- by rossmcm

- by Daniel

- by Justin Heather Barrios

- by JordanReiter

< Previous Page | 246 247 248 249 250 251 252 253 254 255 256 257 | Next Page >