Question on the best approach for gathering tweets based on time

  • Thread starter Panagiotis Atmatzidis
  • Start date
P

Panagiotis Atmatzidis

Hello,

I want to know what is the best way to gather tweets from a specific=20
date till 'time.now'.=20

I have a database which I dumped all user tweet history. Tweets are=20
dumped in a sqlite3 database. My db fields are tweet.created_at,=20
tweet.text and tweet.id plus an integer as key.=20

I use tweet.id to perform a match test before accepting new tweets on=20
the database.=20

However, now the script tries to dump all possible=20
tweets from twitter's API every time, do the match and add the ones=20
that are missing (which are the ones of course). The procedure, as you=20=

imagine, causes big delays.

The created_at date string is like this: "Tue Jul 06 10:08:23 +0000 =
2010"

Time matters, I can't deal only with dates.

I have a couple of solutions in mind, but I'd like to know from more=20
experienced users which way to approach this:=20

1) Convert the 'created_at' string to YYYY-MM-DD date? This could be=20
tricky because there's also the exact time of the tweet to consider. =
(didn't try it out yet)

2) Using sqlite3's "id integer primary key" which uses the biggest=20
number for the latest entry and extract date from there?=20

3) Any smarter way?=20

Thanks
 
R

Robert Klemme

Hello,

I want to know what is the best way to gather tweets from a specific
date till 'time.now'.

I have a database which I dumped all user tweet history. Tweets are
dumped in a sqlite3 database. My db fields are tweet.created_at,
tweet.text and tweet.id plus =A0an integer as key.

I use tweet.id to perform a match test before accepting new tweets on
the database.

However, now the script tries to dump all possible
tweets from twitter's API every time, do the match and add the ones
that are missing (which are the ones of course). The procedure, as you
imagine, causes big delays.

The created_at date string is like this: "Tue Jul 06 10:08:23 +0000 2010"

Time matters, I can't deal only with dates.

I have a couple of solutions in mind, but I'd like to know from more
experienced users which way to approach this:

1) Convert the 'created_at' string to YYYY-MM-DD date? This could be
tricky because there's also the exact time of the tweet to consider. (did= n't try it out yet)

2) Using sqlite3's "id integer primary key" which uses the biggest
number for the latest entry and extract date from there?

3) Any smarter way?

It seems this is rather a question for the twitter API. If that API
provides some id for every tweed and if it provides a mechanism to
query "all tweets after <id>" then the solution is obvious: store the
twitter id in your table and fetch do the fetch accordingly.

http://apiwiki.twitter.com/w/page/22554679/Twitter-API-Documentation

Turns out that id does exist (see XML format):
http://dev.twitter.com/doc/get/statuses/public_timeline

And there is also the "since" query type:
http://dev.twitter.com/doc/get/statuses/user_timeline

Happy coding!

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,989
Messages
2,570,207
Members
46,783
Latest member
RickeyDort

Latest Threads

Top