Help with datascraping script

Joined
Aug 26, 2024
Messages
2
Reaction score
1
Hello everybody, I'm going to be completely honest, I've recently started coding through ChatGPT, and this whole script has been completely written by ChatGPT and copilot for cross referencing. I want to learn coding, and this is how I'm doing that. Anyways, I've made chatgpt write me a script to automate a task that I need to get done at work, and it keeps failing. The way that it fails is either by spitting out a completely blank excel sheet, or by filling a column that should be filled with names and titles with "no relevant info found" and "failed to retrieve." I'm trying to see if there is a way that I could adjust the script in a way that chatgpt and copilot havent that way, I can have a more successful program? I'm going to link the script below, and a screenshot of what my current results are. If anyone would like to help me out, I'd really appreciate it! :)
 

Attachments

  • file for forum.txt
    5.5 KB · Views: 14
  • image.png
    image.png
    418.1 KB · Views: 14
Joined
Jul 4, 2023
Messages
453
Reaction score
54
To read data from a website you need to know the structure of the html code that was used to format the data, for example we have the following code:

HTML:
<div>
  <p>
    Lorem ipsum dolor sit amet. Quo voluptatem cupiditate aut enim obcaecati a minus sapiente ut incidunt nemo ex nihil earum et odit ducimus sit necessitatibus praesentium. Sed consectetur enim 33 cupiditate minus non velit reprehenderit et ipsum dolore. Qui consequuntur mollitia sit eligendi consequatur eum accusantium suscipit. Est nemo optio ad aliquam aspernatur et aliquid recusandae aut sapiente fuga in maxime voluptas 33 repudiandae praesentium.
  </p>
  <p class="lorem">
    Lorem ipsum dolor sit amet. Quo voluptatem cupiditate aut enim obcaecati a minus sapiente ut incidunt nemo ex nihil earum et odit ducimus sit necessitatibus praesentium. Sed consectetur enim 33 cupiditate minus non velit reprehenderit et ipsum dolore. <b>Qui consequuntur mollitia</b> sit eligendi consequatur eum accusantium suscipit. Est nemo optio ad aliquam aspernatur et aliquid recusandae aut sapiente fuga in maxime voluptas 33 repudiandae praesentium.
  </p>
    <p>
    Lorem ipsum dolor sit amet. Quo voluptatem cupiditate aut enim obcaecati a minus sapiente ut incidunt nemo ex nihil earum et odit ducimus sit necessitatibus praesentium. Sed consectetur enim 33 cupiditate minus non velit reprehenderit et ipsum dolore. Qui consequuntur mollitia sit eligendi consequatur eum accusantium suscipit. Est nemo optio ad aliquam aspernatur et aliquid recusandae aut sapiente fuga in maxime voluptas 33 repudiandae praesentium.
  </p>
</div>

and we want to get the text between the b tags from the "middle" p tag.
In this case we will use javascript e.g. document.querySelector,

HTML:
<div>
  <p>
    Lorem ipsum dolor sit amet. Quo voluptatem cupiditate aut enim obcaecati a minus sapiente ut incidunt nemo ex nihil earum et odit ducimus sit necessitatibus praesentium. Sed consectetur enim 33 cupiditate minus non velit reprehenderit et ipsum dolore. Qui consequuntur mollitia sit eligendi consequatur eum accusantium suscipit. Est nemo optio ad aliquam aspernatur et aliquid recusandae aut sapiente fuga in maxime voluptas 33 repudiandae praesentium.
  </p>
  <p class="lorem">
    Lorem ipsum dolor sit amet. Quo voluptatem cupiditate aut enim obcaecati a minus sapiente ut incidunt nemo ex nihil earum et odit ducimus sit necessitatibus praesentium. Sed consectetur enim 33 cupiditate minus non velit reprehenderit et ipsum dolore. <b>Qui consequuntur mollitia</b> sit eligendi consequatur eum accusantium suscipit. Est nemo optio ad aliquam aspernatur et aliquid recusandae aut sapiente fuga in maxime voluptas 33 repudiandae praesentium.
  </p>
  <p>
    Lorem ipsum dolor sit amet. Quo voluptatem cupiditate aut enim obcaecati a minus sapiente ut incidunt nemo ex nihil earum et odit ducimus sit necessitatibus praesentium. Sed consectetur enim 33 cupiditate minus non velit reprehenderit et ipsum dolore. Qui consequuntur mollitia sit eligendi consequatur eum accusantium suscipit. Est nemo optio ad aliquam aspernatur et aliquid recusandae aut sapiente fuga in maxime voluptas 33 repudiandae praesentium.
  </p>
</div>

<script>
  alert(
    document.querySelector('p.lorem b').textContent + '\n' +
    document.querySelector('p.lorem b').outerHTML
  );
</script>
IMO your python code performing a similar task does not have this specification regarding a specific page (its html code).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,879
Messages
2,569,939
Members
46,232
Latest member
DeniseMcVi

Latest Threads

Top