Search Plugin Module: Preserve title HTML

almostcompletely · April 2015

Here's the problem; My site has a bunch of pages detailing animal species. The page title is the species' scientific name. The international scientific standard for writing scientific names is in italics. When Fuel creates the search indexes, all the HTML formatting is removed (good. I understand why) - unfortunately, at the same time, it removes the <i> tags from the page titles. Therefore, when you do a search, the results page lists all the pages with their scientific name without italics - a bad thing.

So what we want is the ability to index a page but preserve the inner HTML for the HTML element containing the page title.

How I've done this (possibly a bit clunky).

In ./config/search.php, I've added
$config['search']['preserveTitleHTML'] = true;

In ./modules/search/libraries/Fuel_search.php, I've added:

function _find_title_tag($xpath, $tags)
	{
	   if (is_string($tags))
	   {
	      $tags = preg_split('#,\s*#', $tags);
	   }

	   foreach ($tags as $tag)
	   {
	      // get the xpath equation for querying if it is not already in xpath format
	      if (preg_match('#^<.+>#', $tag, $matches))
	      {
	         $tag = $this->get_xpath_from_node($tag);
	      }

	      // get the h1 value for the title
	      $tag_results = $xpath->query('//'.$tag);

	      if ($tag_results->length)
	      {
	         foreach($tag_results as $t)
	         {
	            //$value = (string) $t->nodeValue;
	            $innerHTML = '';
	            $children = $t->childNodes;
	            foreach ($children as $child) {
	               $innerHTML .= $child->ownerDocument->saveXML( $child );
	            }
	            return $innerHTML;
	         }
	      }

	   }

	   return FALSE;
	}

(basically the same as _find_tag() but with the extra $innerHTML bits).

And then also in ./modules/search/libraries/Fuel_search.php, edited the find_page_title() function:

function find_page_title($xpath)
{
   //$t = $this->_find_title_tag($xpath, $this->config('title_tag'));
   if ($this->config('preserveTitleHTML')) {
      $t = $this->_find_title_tag($xpath, $this->config('title_tag'));
   } else {
      $t = $this->_find_tag($xpath, $this->config('title_tag'));
   }
   return $t;
}

Finally, in ./modules/search/libraries/Fuel_search.php's create() function, I've added the "if" statement:

if (!$this->config('preserveTitleHTML')) {
   $values['title'] = $this->format_title($values['title']);
?

Now, when I index my site, the title field in the search index table is populated with my page titles - with italics intact - and the search results page shows as desired.

admin · April 2015

Thanks for posting back. Would you mind sending in a pull request for these code changes? One minor change would be to change the preserveTitleHTML to preserve_title_html to be consistent with how other config keys.

https://github.com/daylightstudio/FUEL-CMS-Search
https://help.github.com/articles/using-pull-requests/

almostcompletely · April 2015

Will do - when I get some bandwidth...

I've come across another issue which I have a fix for. My site is about to go beta. There's lots of pages to index. Because it's beta, we have a robots.txt in place to stop the site from being spidered for now. Unfortunately, the search plugin looks for a robots.txt and removes indexed pages rather than indexing them if a robots.txt exists.

I've added this to my search config file
$config['search']['ignore_robots'] = TRUE;

...and wrapped the contents of ./libraries/Fuel_search.php's check_robots_txt() in

if ($this->config('ignore_robots') == false) {
..
} else {
   return true
}

admin · April 2015

Sounds good... and yes, please provide a pull request when you have time as these look like good additions to the module that others may benefit from.

almostcompletely · April 2015

Hopefully you should have a pull request ready...

Howdy, Stranger!

Categories

Search Plugin Module: Preserve title HTML

Comments