Mega Code Archive

 
Categories / Delphi / Examples
 

Extract the html from a page loaded in twebbrowser

Question: How can I get the HTML from a web page that I loaded in TWebBrowser? I want to clip some web contents? Answer: You can use the Document property - it has a lot of interesting properties: Document.All Document.bgColor Document.Body.innerHTML Document.Body.Style.overflowX Document.Body.Style.overflowY Document.Body.Style.zoom Document.cookie Document.documentElement.innerHTML Document.documentElement.innerText Document.FileSize Document.Frames Document.Images Document.LastModified Document.Links Document.Location.Protocol Document.ParentWindow Document.ParentWindow.ScrollBy(iX: Integer; iY: Integer) Document.Selection Document.Title Document.URL of which the Body.innerText will serve our purpose. The only limitation of this solution is that it is giving us the HTML as the web browser displays it - which may be different from what 'View Source' in Internet Explorer would show. If the original HTML file included javascript dynamically generating content like this: <script language='JavaScript'> document.write('Hello Visitor'); </script> then the above function will show the output 'Hello Visitor' but not the original javascript. You need to take a look at the browser cache to get to the original file or use something other than TWebBrowser. // tested with Delphi 6, should work in Delphi 5 as well uses HTTPApp, MSHTML; procedure TForm1.WebBrowser1DocumentComplete(Sender: TObject; const pDisp: IDispatch; var URL: OleVariant); var document : IHTMLDocument2; s : string; begin // extract the day's total earnings etc Document := Webbrowser1.Document as IHTMLDocument2; s := Document.Body.innerHTML; // process this string to extract contents end;