Frequently Asked Questions
- What is the Ask Joe Howe Search?
- Do I have to pay for this service?
- How do I put the search box on my Web page?
- Does Service Nova Scotia provide assistance for implementing the search box code?
- Can't find a document that I know is on a Web server
- What gets indexed by the appliance?
- What is indexing?
- What is a collection?
- How often is the collection updated?
- Site Search vs Collection parameter?
- What is my collection?
- What is a frontend?
- How can I get a frontend created?
- What is my frontend?
- What types of documents get indexed?
- How are the page descriptions created on the search results page?
- Prevent Web servers or directories from being indexed (robots.txt)
- Prevent individual Web pages from being indexed (Meta Tag exclusion of ROBOTS)
What is the Ask Joe Howe Search?
The "Ask Joe Howe" search engine facility uses the Google Enterprise Search Appliance (Google ESA) for indexing documents on Government websites. The Google ESA uses the same search technology software that's used for the commercial version of the Google Search Engine. For more information, review the FAQ questions below or contact us.
Do I have to pay for this service?
At this time, the Ask Joe Howe Search service is a baseline service provided by Service Nova Scotia and Municipal Relations. You don't have to pay any fees to put a search box on to your Official Government of Nova Scotia Web pages or Municipalities of Nova Scotia Web pages.
How do I put the search box on my Web page?
We have made available HTML code examples for placing a search box form onto Official Government of Nova Scotia Web pages. You can use a standard search box that performs searches across all Government of Nova Scotia Web sites, or you can create a custom search experience (localized searches for your department). For customized search results, Service Nova Scotia and Municipal Relations has developed a Google Developer Kit.
Does Service Nova Scotia provide assistance for implementing the search box code?
To assist you with implementing the search box code, Service Nova Scotia has made available the HTML code examples for placing a search box form onto Official Government of Nova Scotia Web pages. If assistance is required, please contact your local webmaster as implementation is a fairly straightforward process and doesn't require advanced Web developer support. If you want to experiment with various implementations, you can find additional assistance within the Service Nova Scotia and Municipal Relations Google Developer Kit. Please note: Service Nova Scotia does not provide assistance with tailoring the XSLT (XSL stylesheet for transformation of the XML results) page for custom search results.
Can't find a document that I know is on a Web server
If you perform a search and the specific page you are looking for cannot be found, one or more of the following is happening:
- Your search query was not constructed correctly
- The webserver or the document was not available at the time the system was indexed
- There is no link to the document from any other document that was indexed
- Last but not nearly as likely, we've blocked either the directory or webserver that contains the page. (We have blocked some pages at the request of some Web Coordinators).
What gets indexed by the appliance?
The indexing process crawls nearly all pages (URLs) contained on all webservers in the following domains:
| Search Domain | Web sites Associated with the Search Domain |
|
gov.ns.ca |
Government Web sites in the gov.ns.ca domain (e.g. includes www.gov.ns.ca and museums.gov.ns.ca) |
| ednet.ns.ca | Department of Education Domain |
| region.halifax.ns.ca | Halifax Regional Municipality Domain |
| cbrm.ns.ca | Cape Breton Regional Municipality Domain |
| library.ns.ca | Nova Scotia Provincial Library Domain |
| cbv.ns.ca | Cape Breton-Victoria Regional School Board Domain |
| hrsb.ns.ca | Halifax Regional School Board Domain |
| apef-fepa.org | Atlantic Provinces Education Foundation Domain |
| apsea.ca | Atlantic Provinces Special Education Authority Domain |
| wcb.ns.ca | Workers' Compensation Board of Nova Scotia Domain |
| businessgateway.ca | BusinessGateway.ca Domain |
| film.ns.ca | Nova Scotia Film Development Corporation Domain |
| innovacorp.ns.ca | InNOVAcorp Domain |
| lawreform.ns.ca | Law Reform Commission of Nova Scotia Domain |
| nsliquor.ns.ca | Nova Scotia Liquor Corporation Domain |
| wdclhalifax.com | Waterfront Development Corporation Limited Domain |
| wtcchalifax.com | World Trade and Convention Centre Domain |
The indexing process begins crawling at the home page of each of the above domains and proceeds to crawl ALL URLs (http based web links) that are linked from these pages and contained within the above domains. It may take 100 or even 1000 jumps for the crawler to find a page, but it will find it if it is somehow linked from another page that gets indexed.
Exceptions:- If a government webserver has a robots.txt file or an individual webpage has a Meta Tag specifying that it not be indexed, the Ask Joe Howe Search will honor these requests and not search the documents.
- Though not likely, we may have blocked either the directory or webserver that contains the page your are looking for. (We have blocked some pages at the request of some Government Web Coordinators due to inappropriateness to Government of Nova Scotia Standards (personal pages on EDnet servers) or redundancy with other pages).
- We have allowed only certain file-types to be indexed in order to maintain a high level of integrity and quality user experience with search results.
What is indexing?
Okay, here's the really basic explanation of the indexing process: Indexing is the process of sending a robot program to a starting URL (web address) to discover the various files (web pages and files) that are interconnected from this start URL. The files are returned to the search server and filtered for indexable documents (eg. certain image files are not indexable). The search server then extracts the document's location, a summary of the page's content, all of the keywords, and special fields. This information is then inserted into a collection for searching.
What is a collection?
The Google Search Appliance creates a repository of all of the URLs that were crawled and accepted by the search process in an index; these contain among other things document locations, document attributes, and key words. A collection lets your users search over a specific part of the index. For example, we have created a GOVNS_MUSEUMS collection and a GOVNS_JUST collection that support searches only within the museums.gov.ns.ca and the gov.ns.ca/just part of the index.
How often is the collection updated?
Ask Joe Howe Search (the Google ESA) is continously sending out the Googlebot web crawler to crawl Government Web sites. The continuous crawl system provides fresher content; the appliance crawls and indexes as an ongoing process. You will know that the Googlebot has visited your Web sites by looking at your Web server log files for the User Agent Name: govns-gsa2-crawler. If your web page has been changed, the search results should reflect your updated webpage following the next indexing of the Web site. See below in the "Can't find a document that I know is on a webserver" section if your changed document is NOT reflected in the search results.
Site Search vs Collection parameter?
When you look at the sample form code provided by Service Nova Scotia, one of the hidden parameters is called "sitesearch". If you wish to search only your domain or sub folder(s), this parameter should be used.
In some cases there may be benefit to searching content from folders or domains that are not part of the same tree. In these cases Service Nova Scotia must create what is called a collection for you.
For example we created a collection called "GOVNS" that combines the URLs crawled on both the gov.ns.ca and ednet.ns.ca domains onto one integrated search. This way, even though the content comes from two different domains (trees), it may be searched from one search box.
In these cases a developer would have to replace the sitesearch parameter:
<input type="hidden" name="sitesearch" value="http://www.gov.ns.ca/yourdirectory">
With the collection parameter:
<input type="hidden" name="site" value="your_collection">
To have a collection created on the Ask Jowe Howe search engine. Please contact Service Nova Scotia and we can arrange for it to be done.
What is my collection?
Your collection will be chosen depending on what domain(s) you would like searched. Therefore, choose a collection that corresponds to your search domain(s).
What is a frontend?
At its most basic level, a "frontend" is simply a lens for presenting the search results from a "collection" in the Google appliance. To accomplish this, frontends can be edited to control the results from a search, ranging from restricting content access to shaping the presentation of search results:
- Frontends can be used to control search results based on domain names, directories, file-types, languages used, and even by matching meta tags. Specific URL patterns can also be restricted from the search results.
- Frontends can be used to pinpoint specific words or phrases for specific URLs or pages ("KeyMatch") and can also be used to customize how Google handles certain synonyms.
- Frontends are used to control the presentation of search results, replacing the use of external XSLT stylesheets.
How can I get a frontend created?
The first step to working with frontends is requesting a frontend through email. In requesting a new frontend, please indicate the following information:
- Contact information for the Web developer(s)/webmaster(s) associated with the frontend.
- The search results page style you would like to use. The default style looks like this.
- If you have prepared a custom XSLT stylesheet to display search results, your XSLT sheet must be forwarded to SNSMR to be applied and stored on the Google appliance.
SNSMR will try to process all Google frontend requests as soon as they come in. Once a frontend has been created, it is usually available for search results within a matter of hours (if not sooner), since it is simply a filter against the main collection of the appliance.
What is my frontend?
If your website uses an external XSLT stylesheet to present search results in a customized display, that stylesheet must now be stored on the Google appliance rather than served on a URL-accessible file system. (This change is due to security considerations about cross-site scripting attacks using external URL sources.) SNSMR has tried to save all of the URL-accessible stylesheets associated with a department, agency or municipality and import them into the new appliance, creating a frontend for each custom XSLT being used on the old appliance. Each department's frontend on the new appliance already has the appropriate stylesheet code associated with it.
The value for the "proxystylesheet" parameter must be changed from a fully-qualified URL to the name of your site's frontend. The following frontends are associated with existing XSLT stylesheets:
| XSLT stylesheet | Frontend Name |
| http://www.gov.ns.ca/snsmr/xslt/askjoehowe.xslt | GOVNS |
| http://www.gov.ns.ca/snsmr/xslt/askjoehowe_fr.xslt | GOVNS_FR |
| http://www.gov.ns.ca/acadian/xslt/acadian.xslt | GOVNS_ACADIAN |
|
http://www.gov.ns.ca/cmns/advertising/google/ askjoehowe_custom.xslt |
GOVNS_CNS_ADS |
| http://www.gov.ns.ca/econ/search/google_econ.xslt | GOVNS_ECON |
| http://www.gov.ns.ca/emo/emo.xslt | GOVNS_EMO |
| http://www.gov.ns.ca/enla/google/enla.xslt | GOVNS_ENLA |
| http://www.gov.ns.ca/finance/google/finance.xslt | GOVNS_FIN |
|
http://www.gov.ns.ca/geonova/ask_joe/ askjoehowe_custom_v2.xslt |
GOVNS_GEONOVA |
| http://www.gov.ns.ca/health/google/health.xslt | GOVNS_HEALTH |
| http://www.gov.ns.ca/just/google/JusticeSearchResults.xslt | GOVNS_JUST |
|
http://www.gov.ns.ca/just/regulations/google/ RegulationsSearchResults.xslt |
GOVNS_JUST_REGS |
|
http://www.gov.ns.ca/legislature/HOUSE_BUSINESS/ google/custom3.xslt |
GOVNS_LEGI |
|
http://www.gov.ns.ca/legislature/HOUSE_BUSINESS/ google/custom1.xslt |
GOVNS_LEGI_HANSARD |
| http://www.gov.ns.ca/legislature/legc/google/askjoe.xsl | GOVNS_LEGI_ASKJOE |
| http://www.gov.ns.ca/nsarm/scripts/dcmetadata.xslt | GOVNS_NSARM |
| http://www.gov.ns.ca/scs/google/scs.xslt | GOVNS_SCS |
| http://www.gov.ns.ca/snsmr/xslt/snsmr.xslt | SNSMR |
| http://www.gov.ns.ca/snsmr/xslt/snsmr_fr.xslt | SNSMR_FR |
| http://www.gov.ns.ca/snsmr/paal/paal4.xslt | SNSMR_PAAL |
|
XSLT Sheets within the www.ednet.ns.ca domain |
|
| http://www.ednet.ns.ca/google/askjoehowe_custom2.xslt | GOVNS_EDNET |
|
XSLT Sheets for Municipalities |
|
| http://www.gov.ns.ca/snsmr/xslt/hrm.xslt | HRM |
| http://www.gov.ns.ca/snsmr/xslt/municipal/municipal.xslt | MUNICIPAL |
| http://www.wearepictoucounty.com/images/prdc_google.xslt | NSARDA_PICTOU |
| http://66.111.106.68/xslt/prdc_google.xslt | NSARDA_PICTOU_PORTAL |
| http://www.town.amherst.ns.ca/site/search/amherst.xslt | NSMUN_AMHERST |
| http://www.queens.ca/google/regionofqueens.xslt | NSMUN_QUEENS |
| http://www.truro.ca/google/truro_google.xslt | NSMUN_TRURO |
| http://www.gov.ns.ca/snsmr/xslt/parl/parl.xslt | PARL |
|
Other XSLT Sheets |
|
| http://www.novascotialife.com/novascotialife2.xslt | NSLIFE |
| http://www.gov.ns.ca/snsmr/xslt/HealthNetwork.xslt | HEALTHNETWORK |
Additional frontends:
Two new frontends have also been created; the frontend names are GOVNS and GOVNS_FR (French frontend). The search results format looks like this.
If SNSMR did not successfully identify and import your custom XSLT stylesheet, then please contact us and we will setup a frontend on the new appliance for your custom XSLT stylesheet.
What types of documents get indexed?
The Googlebot indexing agent will do a full index on several types of
documents. Although the Google Appliance is capable of indexing WELL OVER
200 different file-types, ITS has limited the variety of file-types indexed
in order to maintain a high quality user experience. Too many file-types
will tend to clutter search results with "junk" links. Service
Nova Scotia has configured the Google ESA to index URLs that end with
the following file-type extensions: .stm, .html, .htm, .ihtml, .ghtml,
.phtml, .shtml, .asp, .jsp, .pl, .php, .cfm, .xml, .doc, .dot, .xls, .pdf. These
include the following standard and vendor formats:
| File-Type | File Format |
| HTML |
HTML output based Web documents (this includes .html, .htm, .ihtml, .ghtml, .phtml, .shtml, .stm, .asp, .aspx, .jsp, .pl, .php, .cfm, and .xml) |
| Adobe Acrobat PDF documents. | |
| DOC, DOT | Microsoft Word (and other formats that use the .doc file-type extension) |
| XLS | Microsoft Excel |
| PPT |
Microsoft Power Point |
| .wpd | WordPerfect |
If you would like to see other file formats indexed into the Google ESA collection, please email the Search Engine Administrator with your request specifying the file-type to be indexed, the types of files and or applications that are involved, and your reasons as to why this file-type should be included. We will consider your request and weigh it against our stated goal of maintaining an overall high quality user experience. Please note: As stated above in the What gets indexed into the collection section, a URL can only be indexed if it is ultimately linked from the Home Page of a domain we crawl (even if the link is 1000 links deep from the Home Page).
How are the page descriptions (the snippet) created on the search results page?
The summary or as Google calls it, the snippet, is determined by word concentration in the document, keyword proximity, and other factors. Service Nova Scotia has no control over how the summary is developed.
Prevent web servers or directories from being indexed
(robots.txt)
The robots.txt file is used on many web sites to specify what parts of the site indexers should avoid. See http://www.robotstxt.org/wc/exclusion-admin.html for a discussion of robots.txt. To limit the indexing ability of the Ask Joe Howe indexing process on a web server you will need to use a robots.txt file.
Prevent individual web pages from being indexed
(Meta Tag exclusion of ROBOTS)
What if I can't make a robots.txt file?
Sometimes you cannot make a robots.txt file, because you don't administer the entire server. All is not lost, you can use a <META> tag within the <HEAD> tags of your HTML document (in the head section of your web page [it's a hidden field if you're using an HTML Editor]):
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
This tag cause the indexing engine to NOT index the page nor follow the link(s) on the page (see additional info below).
For example, insert the meta tag into your HTML like so...<HTML>
<HEAD>
<TITLE> The title of your page</TITLE>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
</HEAD>
<BODY....>
The body of your web page
.
.
.
</BODY>
</HTML>
If you want the indexing engine to index your page but NOT follow the links on your page then use the following:
<META NAME="ROBOTS" CONTENT="NOFOLLOW">
If you want the indexing engine to NOT index your page but TO FOLLOW the links on your page then use the following (this might be used for a table of contents page or something like a large list of links):
<META NAME="ROBOTS" CONTENT="NOINDEX">Additional information on the Robots META Tag is available at http://www.robotstxt.org/wc/exclusion.html#meta
