Ay suggestions for finding all src attributes in a document ?

Dr J R Stockton · Feb 13, 2011

Starting from a variable to which has been assigned a document or
ContentDocument, how best can I locate all elements therein that have a
src attribute that refers to a file?

For anchors and links, there are convenient prefabricated collections;
for IDs, I know nothing better than walking the whole DOM tree and
looking. Must I do the same for src attributes, or is there anything
better?

Probably all will refer to files, present or absent; I can probably
check the format of the attribute, or the tag of the containing element,
easily enough.

David Mark · Feb 14, 2011

Starting from a variable to which has been assigned a document or
ContentDocument, how best can I locate all elements therein that have a
src attribute that refers to a file?

You mean the file protocol? Or do you mean all elements that have a
SRC attribute?

For anchors and links, there are convenient prefabricated collections;

Anchors and links are of no concern here. Perhaps you meant images?

for IDs, I know nothing better than walking the whole DOM tree and
looking.

I expect you mean for other elements that are allowed to have SRC
attributes, but lack convenient collections (e.g. SCRIPT). Use gEBTN.

Must I do the same for src attributes, or is there anything
better?

No, you don't have to walk the DOM in its entirety. Not every element
can have a SRC attribute (and for some types it is not optional).
What are you trying to figure out exactly?

Probably all will refer to files, present or absent;

I don't follow.

I can probably
check the format of the attribute, or the tag of the containing element,
easily enough.

Yes, probably. But you should use the src *property*. And if you use
gEBTN, you will already know the tag name.

Thomas 'PointedEars' Lahn · Feb 14, 2011

Dr said:
Starting from a variable to which has been assigned a document or
ContentDocument, how best can I locate all elements therein that have a
src attribute that refers to a file?

Define: file.

For anchors and links, there are convenient prefabricated collections;
for IDs, I know nothing better than walking the whole DOM tree and
looking.
IDs?

Must I do the same for src attributes, or is there anything better?

So far only in !MSHTML, you can use DOM 3 XPath for both HTML and X(HT)ML
documents:

var doc = â€¦;

var result = doc.evaluate('//*[starts-with(@src, "file://")]',
doc, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);

for (var i = 0, len = result.snapshotLength; i < len; ++i)
{
â€¦ snapshotItem(i) â€¦
}

<https://developer.mozilla.org/en/introduction_to_using_xpath_in_javascript>

Assuming the Document object referred to by `doc' was served with an XML
MIME media type, you can use XPath via the MSXML DOM as well:

var result = doc.selectNodes('//*[starts-with(@src, "file://")]');
for (var i = 0, len = result.Count(); i < len; ++i)
{
â€¦ result.Item(i) â€¦
}

(untested)

<http://msdn.microsoft.com/en-gb/library/hcebdtae(v=VS.90).aspx>

HTH

PointedEars

Dr J R Stockton · Feb 16, 2011

In comp.lang.javascript message <5eb36da3-0b63-44c6-8342-75ed068d4896@k3
8g2000vbn.googlegroups.com>, Sun, 13 Feb 2011 16:11:59, David Mark

You mean the file protocol? Or do you mean all elements that have a
SRC attribute?

All elements of the document that have a SRC attribute. AFAIR, that
attribute is normally a string which looks like what one in HTML writes
between the quotes that follow 'a href=', but always is in the full
absolute form including protocol part. Those that are otherwise I do
not want to find; but if they occur they can be eliminated at a later
stage.

Anchors and links are of no concern here. Perhaps you meant images?

They are not of concern, but illustrate by their collections what would
be a nice way to find all SRC elements.

I expect you mean for other elements that are allowed to have SRC
attributes, but lack convenient collections (e.g. SCRIPT). Use gEBTN.

No. I mean for finding all elements that have IDs; or rather, for
finding the ID values, as I do not need the element.

No, you don't have to walk the DOM in its entirety. Not every element
can have a SRC attribute (and for some types it is not optional).
What are you trying to figure out exactly?

I don't want to have to know the tag names of all elements that can USE
an HTML SRC property. And, surely, one can apply a SRC property to any
element by script?

I am trying to find all SRC attribute values in the document that name
required files, and prepared to receive any that do not refer to files
(Throughout, the files can be present or absent)

I don't follow.
Evidently.

Yes, probably. But you should use the src *property*. And if you use
gEBTN, you will already know the tag name.

All the ones of interest are strings; it should suffice to use typeof
and note non-strings for future consideration.

But the best answer is the one which appeared immediately after my
article "above" was committed to NNTP. The tree-walk which recursively
locates all IDs was recently added to the code, and slows it by an
unimportant amount. It will therefore be, and is, entirely satisfactory
and efficient to test elements for having a SRC during the same tree-
walk.

As you presumably know, the collection document.links holds link objects
which have various properties such as href, protocol, pathname, href -
now I need to parse the SRC attribute into a sufficiently matching
object.

Ry Nohryb · Feb 17, 2011

All elements of the document that have a SRC attribute. AFAIR, that
attribute is normally a string which looks like what one in HTML writes
between the quotes that follow 'a href=', but always is in the full
absolute form including protocol part. Those that are otherwise I do
not want to find; but if they occur they can be eliminated at a later
stage.

var hrefs= document.querySelectorAll('*[href]');
var srcs= document.querySelectorAll('*[src]');

(...)

No. I mean for finding all elements that have IDs;

var ids= document.querySelectorAll('*[id]');

or rather, for
finding the ID values, as I do not need the element.

var ids= document.querySelectorAll('*[id]');
function mapper (e) { return e.id }
ids= [].map.call(ids, mapper).sort();

Ry Nohryb · Feb 17, 2011

All elements of the document that have a SRC attribute. AFAIR, that
attribute is normally a string which looks like what one in HTML writes
between the quotes that follow 'a href=', but always is in the full
absolute form including protocol part. Those that are otherwise I do
not want to find; but if they occur they can be eliminated at a later
stage.

Click to expand...

var hrefs= document.querySelectorAll('*[href]');
var srcs= document.querySelectorAll('*[src]');

var hrefs_and_srcs= document.querySelectorAll('*[href],*[src]');

Thomas 'PointedEars' Lahn · Feb 19, 2011

Dr said:
David Mark posted:

All elements of the document that have a SRC attribute.

HTML attribute names are declared and usually referred to all-lowercase.

AFAIR, that attribute is normally a string which looks like what one in
HTML writes between the quotes that follow 'a href=',

It is of type URI.

but always is in the full absolute form including protocol part.

No, it does not need to.

Those that are otherwise I do not want to find; but if they occur they can
be eliminated at a later stage.

That implies that in the end you cannot use properties of element objects,
but must refer to attributes of elements. There is a problem with this
approach as MSHTML does not always differentiate between those two kinds
(the well known IE getAttribute() bug).

They are not of concern, but illustrate by their collections what would
be a nice way to find all SRC elements.

The term "SRC elements" is wrong in this context. A "SRC element" would be
of the form <SRC â€¦>, potentially followed by content and </SRC>.

Links, as in "`A' element", do not have a `src' attribute in HTML; they have
a `href' attribute instead.

No. I mean for finding all elements that have IDs; or rather, for
finding the ID values, as I do not need the element.

See < The corresponding XPath
expression would be

//*[(@src or @href) and @id]/@id

which can be refined to only include elements which `src' or `href'
attribute is of a certain form (to avoid filtering the results later).
For MSHTML you are on your own, though, and should use
getElementsByTagName(), as David suggested.

Another possibility would be querySelectorAll(), with which you can use CSS
selectors (in particular, CSS attribute selectors) to retrieve a NodeList
of elements. Be aware, though, that implementations vary due to varying
levels of CSS support in browser's layout engines, and that it is not
strictly defined if a CSS attribute selector applies to an element if the
element type has the attribute declared but not specified; implementations
differ there as well.

I don't want to have to know the tag names of all elements that can USE
an HTML SRC property.

HTML elements do not have properties. If you traverse the document tree
with the DOM, you do not have to know the tag names (better: element type
names) of all elements that have a `src' _attribute_; elements that do not
have it, usually do not expose the corresponding attribute property for
their element objects. As a result, a `typeof' test should suffice.

And, surely, one can apply a SRC property to any element by script?

Not necessarily. And AISB, doing so is strongly recommended against.

I am trying to find all SRC attribute values in the document that name
required files, and prepared to receive any that do not refer to files
(Throughout, the files can be present or absent)
//*[@src]/@src

I don't follow.

Click to expand...

Evidently.

Which might have to do with your not expressing yourself in an
understandable manner, inventing terminology where unambiguous standard
terminology exists already.

All the ones of interest are strings; it should suffice to use typeof
and note non-strings for future consideration.

So you already know that. Why ask the question, then?

But the best answer is the one which appeared immediately after my
article "above" was committed to NNTP.

Are you sure?

The tree-walk which recursively locates all IDs was recently added to the
code, and slows it by an unimportant amount.

IMHO, it is still a kludge compared to the approach using W3C DOM Level 3
XPath, which is available in all major DOMs (regardless of Content-Type)
except the MSHTML DOM. You really only need it for IE and older versions of
other UAs nowadays.

It will therefore be, and is, entirely satisfactory and efficient to test
elements for having a SRC during the same tree-walk.

Well, YMMV.

PointedEars

Locating all event handlers and scripts in a document	2	Aug 10, 2006
Finding all instances of a string in an XML file	0	Jun 21, 2013
capture ALL image urls used in document including background images	6	Mar 13, 2007
Finding all empty directories in a subversion checkout	4	Mar 3, 2011
FAQ 2.6 What modules and extensions are available for Perl? What is CPAN? What does CPAN/src/... mea	0	Mar 7, 2011
Finding all the links in a Unix file/directory path	3	May 12, 2009
finding the XPath of a node	3	Mar 18, 2007
Creating an object that can track when its attributes are modified	11	Mar 6, 2013

Ay suggestions for finding all src attributes in a document ?

Dr J R Stockton

David Mark

Thomas 'PointedEars' Lahn

Dr J R Stockton

Ry Nohryb

Ry Nohryb

Thomas 'PointedEars' Lahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads