Using search_files Module with Drupal 6 on an OSX Server
A current project has had me setting up a large database and community driven site for a client. We're using Drupal 6 because of it's stability, features, extensibility, etc., as well as its great user and community-contributed content. One of the key requirements of this site is the ability to search through large amounts of pre-existing data, most of which are in PDF format. Off to find a proper module for extending Drupal's core search..
After spending much time researching the search_files module, I eventually threw my hands in the air and switched over to the Google Custom Search Engine. Google CSE allows integrating a search box into your site, and there's of course an existing Drupal module for easy integration into the core search. Google CSE, however, seems to be having major issues with (my luck) file searching! Sure, it's there for some, however after too many days of fiddling with it, forcing re-indexing, allowing for automated re-indexing, tweaking my sitemap.xml, and as many other tricks as I could think of, I was still unable to get Google CSE to take notice of any of my node attachments. It was time to head back to search_files. Here's the key though, for any of you reading this post to actually get this running on an OSX server.
• Grab the copy of pdftotext from Carsten Blum's Mac Development page
• Install, of course. Checkout the readme. Installs pdftotext (and no other helpers or viewers) to /usr/local/bin/pdftotext
• Install the search_files module (link above) and enable all three modules -- search_files, search_attachments, search_directories
• Head to your search_files settings page, configuring the proper directory for your files (i.e. sites/default/files)
• Change the PDF helper to: /usr/local/bin/pdftotext %file% -
• Now go to your standard search settings page, re-index the site, and run Cron a few times. You should be getting attachments in your search results now!


Search files
Submitted by dgarciad (not verified) on Fri, 05/08/2009 - 07:02.Hi
I have been using search file module for a while in our server and it fits perfectly our needs.
However since a few weeks ago, the system is not indexing any longer due to the fact that the search tables of the databaase have grown too much and we get many timeouts. Now we are dealing with more than 10.000 files and the database size is around 140 Mb.
Have you experienced this kind of problems?
Regards
Check logs / memory
Submitted by Anonymous on Mon, 04/05/2010 - 10:46.Check your drupal logs for starters, most likely the "recent entries" page will show you things like "mysql error", "mysql server went away", etc. If you're indexing that much content you'll probably want to schedule cron more efficiently in comparison to when you experience high traffic, and likewise you'll want more memory on that machine. It's the only way to deal with the amount of data that MySQL is going through.