Extract Text From HTML
Handy! I should try this for searching.
“In the course of improving this website's search engine, I wrote a routine that would extract the text from an article given a URL, strip out the HTML, and then convert all of the white space and carriage returns into single spaces. This was done to compress the size of the text involved, which was then stored in the database and used for full-text searches.”
Posta un commento
<< Home