Login		Register

Best crawler for specific Web sites: How can choose pertinent paragraphs for indexing a specific site

Recommend this page to a friend!

Stumble It!

Bookmark in del.icio.us

	All requests	>	Best crawler for specific Web sites	>	Request new recommendation	>	Featured requests	>	No recommendations

Best crawler for specific Web sites

Edit

by hadra momo - 8 months ago (2015-05-31) crawler

How can choose pertinent paragraphs for indexing a specific site

+1	I created a class using curl (HTTP transport) to get content from certain urls, but I want to get just some paragraphs. My objective is to index some web sites, but I don't want to have bug databases. How can I proceess the retrieved content?

Ask clarification

1 Recommendation

HTML Parser: Parse HTML using DOMDocument

by Dave Smith package author 5955 - 8 months ago (2015-06-01) Comment

This class will parse the document as a string, so you can get the whole webpage using curl or file_get_contents (if you are able to supply url's to fopen). It can then return an array of the entire document or all of a specific element like <p> paragraphs. What you do with the information after that, like saving it to a database, is up to you.

Recommend package

Advertise on this site

For more information send a message to info at phpclasses dot org.