Can I write a crawler in Javascript?

B

bdy120602

In addition to the question in the subject line, if the answer is yes,
is it possible to locate keywords as part of the functionality of said
crawler (bot, spider)?

Basically, I would like to write a stand-alone form (javascript app.)
to perform a site-specific keyword search.

Can I do the aforementioned in Javascript?

Thanks.
 
L

Lee

(e-mail address removed) said:
In addition to the question in the subject line, if the answer is yes,
is it possible to locate keywords as part of the functionality of said
crawler (bot, spider)?

Basically, I would like to write a stand-alone form (javascript app.)
to perform a site-specific keyword search.

Can I do the aforementioned in Javascript?

1. Don't assume that people reading your post can see the subject line.
State your entire question in the body.

2. What you think of as Javascript is almost certainly client-side
code, and it cannot see anything on the server. It is possible on
some servers to execute Javascript on the server, but it's not
something you're likely to want to try.


--
 
T

Thomas 'PointedEars' Lahn

[Subject: Can I write a crawler in Javascript?]
In addition to the question in the subject line,

In the future, please place your question in the message body, not (only) in
the Subject header. The latter is intended for a short description of the
message body instead.
if the answer is yes, is it possible to locate keywords as part of the
functionality of said crawler (bot, spider)?

Basically, I would like to write a stand-alone form (javascript app.) to
perform a site-specific keyword search.

Can I do the aforementioned in Javascript?

You can write a crawler with server-side J(ava)Script/ECMAScript and since
that script would provide access to the full HTTP response message you could
also locate keywords.

If by "site-specific" you mean "on *your* site" then it would even be
possible to use client-side J(ava)Script/ECMAScript. Such a keyword-based
search is implemented in SELFHTML, a well-known German Web development
documentation site (also available in other languages, but not in English
anymore): <http://de.selfhtml.org/navigation/suche/index.htm> However, the
indexing itself would have to be done by a server-side application.


PointedEars
 
T

timothytoe

In addition to the question in the subject line, if the answer is yes,
is it possible to locate keywords as part of the functionality of said
crawler (bot, spider)?

Basically, I would like to write a stand-alone form (javascript app.)
to perform a site-specific keyword search.

Can I do the aforementioned in Javascript?

Thanks.

My question is "why?" Is it because you're familiar with JavaScript
and not server-side languages like PHP, C#, or Ruby?

The typical solution would be to have a PHP crawler on the server. You
can get the info to the server via a form or a JavaScript ajax call.

Even if you could do it in Javascript from the client, cross-site
security in the browser may block your attempts.
 
E

Evertjan.

Lee wrote on 04 jan 2008 in comp.lang.javascript:
2. What you think of as Javascript is almost certainly client-side
code, and it cannot see anything on the server. It is possible on
some servers to execute Javascript on the server, but it's not
something you're likely to want to try.

Why?

Writing serverside javascript is a joy.

Many functions can be written for clientside
and serverside use without any conversion,
like dual input verification of data.

Or do you mean not wanting to try writing a crawler?
 
L

Lee

Evertjan. said:
Lee wrote on 04 jan 2008 in comp.lang.javascript:


Why?

Writing serverside javascript is a joy.

Many functions can be written for clientside
and serverside use without any conversion,
like dual input verification of data.

Or do you mean not wanting to try writing a crawler?

I mean that somebody who has to ask whether it's possible
to write a crawler in Javascript probably doesn't want to
try to write server-side Javascript.

At least not until they've had more experience in writing
client-side Javascript and have gained working knowledge
of client/server differences.


--
 
E

Evertjan.

Lee wrote on 05 jan 2008 in comp.lang.javascript:
Evertjan. said:

I mean that somebody who has to ask whether it's possible
to write a crawler in Javascript probably doesn't want to
try to write server-side Javascript.

At least not until they've had more experience in writing
client-side Javascript and have gained working knowledge
of client/server differences.

Agree!
 
J

jt2190

In addition to the question in the subject line, if the answer is yes,
is it possible to locate keywords as part of the functionality of said
crawler (bot, spider)?

Basically, I would like to write a stand-alone form (javascript app.)
to perform a site-specific keyword search.

Can I do the aforementioned in Javascript?

Thanks.

(I'm assuming that you want something that will run completely inside
of your web browser, and not use Adobe AIR, a Java applet, Firefox
plugin, or anything like that.) I am certain that you can do this.
You'd have to have the web crawler/search logic in one window/frame,
and have it pilot a second window/frame to various web pages, and
search their contents. This probably wouldn't bee too fast, but if you
were only searching a limited number of pages, it'd probably be fast
enough. For bonus points you could try building an index of crawled
content, and searching that.

James Tikalsky
 
T

Thomas 'PointedEars' Lahn

jt2190 said:
In addition to the question in the subject line, if the answer is yes,
is it possible to locate keywords as part of the functionality of said
crawler (bot, spider)?

Basically, I would like to write a stand-alone form (javascript app.)
to perform a site-specific keyword search.

Can I do the aforementioned in Javascript?
[...]

(I'm assuming that you want something that will run completely inside
of your web browser, and not use Adobe AIR, a Java applet, Firefox
plugin, or anything like that.) I am certain that you can do this.
You'd have to have the web crawler/search logic in one window/frame,
and have it pilot a second window/frame to various web pages, and
search their contents. This probably wouldn't bee too fast, [...]

It would not be possible due to the Same Origin Policy, unless the search
would be limited to the OP's site and they would not use frame-breaking
scripts in it at all. But even then one would have to determine when a
document was fully loaded before its content could be accessed. That would
require proprietary markup or the equivalence thereof as it appears that
standards compliant events do not work there, and that would limit the
number of UAs where this approach could work.


PointedEars
 
B

bdy120602

Thank you for all yoru answers. OK, then, which language would be the
easiest to write such an application? Someone mentioned C sharp. You
can disregard whether or not I know the language. Simply: The language
that makes writing such a thing the easiest; however, I would like to
also know if I need a compiler, or other tool. Also, I'm developing
this application as someone without server-side access; without a
server for that matter.

For client-side scripting, would VB work? I know that language better
than most.

Also, I would need it to stand by itself: Such as a regular
applicaiton.

So, an application to perform a site specific text search.

Sorry if I wasn't particular enough.

I look forward to your answers.

Thanks,

Danny


jt2190 said:
In addition to the question in the subject line, if the answer is yes,
is it possible to locate keywords as part of the functionality of said
crawler (bot, spider)?
Basically, I would like to write a stand-alone form (javascript app.)
to perform a site-specific keyword search.
Can I do the aforementioned in Javascript?
[...]
(I'm assuming that you want something that will run completely inside
of your web browser, and not use Adobe AIR, a Java applet, Firefox
plugin, or anything like that.) I am certain that you can do this.
You'd have to have the web crawler/search logic in one window/frame,
and have it pilot a second window/frame to various web pages, and
search their contents. This probably wouldn't bee too fast, [...]

It would not be possible due to the Same Origin Policy, unless the search
would be limited to the OP's site and they would not use frame-breaking
scripts in it at all.  But even then one would have to determine when a
document was fully loaded before its content could be accessed.  That would
require proprietary markup or the equivalence thereof as it appears that
standards compliant events do not work there, and that would limit the
number of UAs where this approach could work.

PointedEars
--
    realism:    HTML 4.01 Strict
    evangelism: XHTML 1.0 Strict
    madness:    XHTML 1.1 as application/xhtml+xml
                                                    -- Bjoern Hoehrmann- Hide quoted text -

- Show quoted text -
 
M

Martin Gregorie

Thank you for all yoru answers. OK, then, which language would be the
easiest to write such an application?
Look at Java because the standard class libraries contain most of the
code needed for following URLs as well as accessing and parsing HTML.

I had no trouble writing a URL checker for my set of private reference
pages using these classes. Its a sort of primitive crawler that parses a
page, extracts the URLs from anchor tags and checks whether the target
object exists.
 
T

Tom Rubaj

In addition to the question in the subject line, if the answer is yes,
is it possible to locate keywords as part of the functionality of said
crawler (bot, spider)?

Basically, I would like to write a stand-alone form (javascript app.)
to perform a site-specific keyword search.

Can I do the aforementioned in Javascript?

Thanks.

still not sure I understand what you want to do.

You want a client-side js-only crawler? I mean "crawler-crawler"? A
script that would analyze a number of web pages, index them and make
them available for the user to browse through with a search engine of
some sort?
It's theoretically possible, but it's a rather puzzling idea : /
Generally, web-crawlers use extensive server-side indexes and search
engines like apache lucene to operate. A client-side web-crawler
sounds just... wrong.

Or do you just want to perform a text search of the content of the
currently displayed page?
You can do that by walking the DOM tree of the document, and, say,
replacing .innerHTML of the tags that match the search criteria with
the same text but bolded or something...

Could you write exactly what you want to use the said crawler for and
how you imagine it should work?
 
B

bdy120602

 still not sure I understand what you want to do.

 You want a client-side js-only crawler? I mean "crawler-crawler"? A
script that would analyze a number of web pages, index them and make
them available for the user to browse through with a search engine of
some sort?
 It's theoretically possible, but it's a rather puzzling idea : /
Generally, web-crawlers use extensive server-side indexes and search
engines like apache lucene to operate. A client-side web-crawler
sounds just... wrong.

 Or do you just want to perform a text search of the content of the
currently displayed page?
 You can do that by walking the DOM tree of the document, and, say,
replacing .innerHTML of the tags that match the search criteria with
the same text but bolded or something...

 Could you write exactly what you want to use the said crawler for and
how you imagine it should work?

Sure I can. BTW: thank you for your patience and continue interest. I
have ideas, but it's hard to articulate sometimes.

Here's a hypothetical:

There's a site, www.alp59r.com. The site contains approximately 500
pages. I would like to conduct a search of the entire site for the
word "build." I want the search to look at all the text, so at a code-
level, if that makes sense.

I don't have access to the site to insert any application to conduct
it from "the inside."

I hope this helps.
 
T

Tom Rubaj

Ok, I guess I get it more or less now.
You can't put such crawler logics in client-only code, sorry.
You need some kind of a server-side application to prepare the search
index and do the searching for the client. And only then you can
question the server-side webservice using some active client-side
technology like ajax or HttpRequest object and json.

Creating your own crawler and a good indexing and searching mechanism
is a serious task, so you might want to use google for it instead -
searchphrase like "searchedWord site::siteName" ("madonna
site::amazon.com") should do the trick. You might want to use that -
if I got you right you're browsing generally accesible web pages and
google has them crawled and indexed as best as they get.
I'm sure if you browse throught google search api docuentation you
can find some ajax or json friendly webservice that you can call from
the client's browser.


(e-mail address removed) napisał(a):
 
B

bdy120602

 Ok, I guess I get it more or less now.
 You can't put such crawler logics in client-only code, sorry.
 You need some kind of a server-side application to prepare the search
index and do the searching for the client. And only then you can
question the server-side webservice using some active client-side
technology like ajax or HttpRequest object and json.

 Creating your own crawler and a good indexing and searching mechanism
is a serious task, so you might want to use google for it instead -
searchphrase like "searchedWord site::siteName" ("madonna
site::amazon.com") should do the trick. You might want to use that -
if I got you right you're browsing generally accesible web pages and
google has them crawled and indexed as best as they get.
 I'm sure if you browse throught google search api docuentation you
can find some ajax or json friendly webservice that you can call from
the client's browser.

(e-mail address removed) napisał(a):









- Show quoted text -

I believe I've still failed to communicate my goal properly. I have a
few programs that do search an entire site for a keyword, and I run
them from my machine: "Light Web Searcher" and "Teleport Pro" are just
a few. My objective is to create a bare-bones software application
identical to the functions available in the aforementioned pieces of
software.

What language would I be able to accomplish that with?

Thanks,

Danny
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,145
Messages
2,570,826
Members
47,371
Latest member
Brkaa

Latest Threads

Top