Wednesday, April 22, 2009

Using meta tags to block search engine access to your site

The noindex meta standard is described at http://www.robotstxt.org/meta.html. This method is useful if you don't have root access to your server, as it allows you to control access to your site on a page-by-page basis.

To prevent all robots from indexing a page on your site, place the following meta tag into the section of your page:

< name="robots" content="noindex">

To allow other robots to index the page on your site, preventing only Google's robots from indexing the page:

< name="googlebot" content="noindex">

When we see the noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. Other search engines, however, may interpret this directive differently. As a result, a link to the page can still appear in their search results.

Note that because we have to crawl your page in order to see the noindex meta tag, there's a small chance that Googlebot won't see and respect the noindex meta tag. If your page is still appearing in results, it's probably because we haven't crawled your site since you added the tag. (Also, if you've used your robots.txt file to block this page, we won't be able to see the tag either.)

If the content is currently in our index, we will remove it after the next time we crawl it. To expedite removal, use the URL removal request tool in Google Webmaster Tools

To entirely prevent a page's contents from being listed in the Google web index even if other sites link to it, use a noindex meta tag. As long as Googlebot fetches the page, it will see the noindex meta tag and prevent that page from showing up in the web index.

Source : Google Support

Robots.txt

A robots.txt file restricts access to your site by search engine robots that crawl the web. These bots are automated, and before they access pages of a site, they check to see if a robots.txt file exists that prevents them from accessing certain pages. (All respectable robots will respect the directives in a robots.txt file, although some may interpret them differently. However, a robots.txt is not enforceable, and some spammers and other troublemakers may ignore it. For this reason, we recommend password protecting confidential information.)

You need a robots.txt file only if your site includes content that you don't want search engines to index. If you want search engines to index everything in your site, you don't need a robots.txt file (not even an empty one).

While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results.

In order to use a robots.txt file, you'll need to have access to the root of your domain (if you're not sure, check with your web hoster). If you don't have access to the root of a domain, you can restrict access using the robots meta tag.

To entirely prevent a page's contents from being listed in the Google web index even if other sites link to it, use a noindex meta tag. As long as Googlebot fetches the page, it will see the noindex meta tag and prevent that page from showing up in the web index.

Source : Google Support

Sunday, April 19, 2009

Some WAS (Websphere application server) commads and tips

- Start a WAS server : Open cmd and navigate to profile bin directory
bin > startserver server1

- Stop a WAS server : Open cmd and navigate to profile bin directory
Security disabled
bin > stopserver server1
Security enabled
bin > stopserver server1 -username -password

- Check the server status : Open cmd and navigate to profile bin directory
bin > serverstatus server1

- To modify the java process definition (jvm arguments, initial heap size, maximum heap size) directly open the server.xml (%WAS Install Dir%\profiles\AppSrv01\config\cells\%cell name%\nodes\%node name%\servers\server1\server.xml) and make the changes. Keep in mind that admin console is the recommended way for making the changes, directly modifying the xml is just a quick way of doing things.

- In case you modify some file used by an installed application, you need to clean (delete) the WAS temporary directory and restart the server. Location of the temp directory
%WAS Install Dir%\profiles\AppSrv01\temp.
Webshpere Application server caches all application files in this directory, hence for the changes to take effect this needs to be deleted so that the cache can be created again with updated files.

- To check which ports the Websphere application server is running : Open the file
(%WAS Install Dir%\profiles\AppSrv01\config\cells\%cell name%\nodes\%node name%\serverindex.xml). Look for WC_defaulthost, the corresponding port number (default 9080) is the port where WAS listens for web requests. Look for WC_adminhost, the corresponding port number is where the admin console listens. Information for all other ports used by WAS is also present in this file.

Note: These are for WAS v 6 and WAS v 6.1