Lazarus网抓。

获取网页数据可以使用fpHttpClient。处理html字符串可以使用正则或sax_html。

下面简单写下抓页面的代码:

program webscarpping;
uses fpHttpClient,classes,sax_html,dom_html,dom;
var HTMLString,url:string;
  doc:THTMLDocument;
  els: TDomNodeList;
  i: smallint;
begin
  url:='http://www.securitychina.com.cn/2018blh/Exhibitors_Detail.asp?NF=2018&UserID=3074';
  HTMLString:= TFPCustomHTTPClient.SimpleGet(url);
  readhtmlfile(doc,TStringStream.create(HTMLString));
  els := doc.GetElementsByTagName('td');
  for i:=0 to els.Count-1 do
      writeln(TDomElement(els[i]).textcontent);
  readln;
end.

 

猜你喜欢

转载自blog.csdn.net/qq_24499417/article/details/105214223