|All requests||>||Best crawler for specific Web sites||>||Request new recommendation||>||Featured requests||>||No recommendations|
by hadra momo - 7 years ago (2015-05-31)
I created a class using curl (HTTP transport) to get content from certain urls, but I want to get just some paragraphs.
My objective is to index some web sites, but I don't want to have bug databases. How can I proceess the retrieved content?
This class will parse the document as a string, so you can get the whole webpage using curl or file_get_contents (if you are able to supply url's to fopen). It can then return an array of the entire document or all of a specific element like <p> paragraphs. What you do with the information after that, like saving it to a database, is up to you.