Crawling data from twitter without using twitter api

Things needed – First thing, Install docker

If you are crawling massive amounts of data,

You probably need mongodb or mysql instance running on the machine

If you want to do analytics on crawled data, you might want to use elastic search + kibana setup


incron and rclone for syncing screenshots to google drive

I frequently take screenshots in my laptop & use them to recall what I had read and worked on (kind of like Anki notes). I wanted to automatically sync my screenshots to google-drive to look at my laptop screenshots on my phone when I am on go.

Since ubuntu automatically stores screenshots in Pictures folder, To automatically watch for new file creation in Pictures folder I used incron(inode notify cron) and to sync them to google-drive I used rclone.

To set-up rclone, refer to

To setup incrontab, refer to

>> incrontab -e

Added the following line

/home/jkl/Pictures IN_CREATE /usr/bin/rclone sync /home/jkl/Pictures google:photos

Move/Mirror/Migrate all your bitbucket repos to github

1.Add ssh keys to your bitbucket and github accounts

2.Generate github token for reading repo info and read/write permission for creating repos (you can give all permission except delete )

3.Use this script ‘’ @ or ( mirror the repos from bitbucket to github. tools and stuff

Upload a site to

1.crawl save each webpage seperately



make warc file and upload to

I uploaded a WARC file but why doesn’t it show up in Wayback Machine?

To ensure content integrity, items with WARC files must have the mediatype set to “web” and be under the Archive Team collection in order for it to be ingested by the Wayback Machine.



Locally browsing with warc file –


Tools – s3 api key ->


Download a site/page from for given time

modify and use this script ->


Drive sync options on ubuntu

1. insync is the best option that is available for ubuntu (if you are willing to pay for it)

2. odrive (you can mount only one account at a time)

Download and install

od="$HOME/.odrive-agent/bin" && curl -L "" --create-dirs -o "$od/" && curl -L "" | tar -xvzf- -C "$od/" && curl -L "" | tar -xvzf- -C "$od/"

Add theses to end of your .bashrc file

export PATH="/home/jkl/.odrive-agent/bin:$PATH"

alias startodrive= "/usr/bin/nohup /home/jkl/.odrive-agent/bin/odriveagent > /dev/null 2>&1 &"

Start the background service


Goto odrive site -> settings -> get your token

odrive authenticate your_auth_token

Link your local folder with corresponding remote drive/folder

 odrive mount /media/jkl/64BCC6E1BCC6ACBC/Users/jkl/Google\ Drive/ /jagadkanihal


backup (send from local to remote)

odrive backup /media/jkl/64BCC6E1BCC6ACBC/Users/jkl/Google\ Drive/ /jagadkanihal


sync (remote to local)

Extension meanings

  • .cloudf  : cloud folder
  • .cloud : cloud file
odrive sync jagadkanihal.cloudf


3. Duplicati (compressed/encrypted one way backup : local to remote)

add backup -> step2. destination -> click on the authid (opens drive – login – fetches auth id) -> path on server should folder name (if left blank, it will choose the main/login folder )


clipboard Append

Appending content to clipboard without overwriting
Here is the script which essentially does this.

#text under selction i.e. highlighted
p=$(xclip -selection primary -o)

# text already in the clipboard , (what happens if there is image in clipboard)
c=$(xclip -selection clipboard -o)

echo "$c$sep$p" | xclip -selection clip
notify-send --expire-time=1000 "ClipAppend" "Appended with newline"

#ubuntu ignore that expire time option hence - workaround hack
sleep 1
pkill notify-osd

Add keyshortcut to run this script:
System settings>> Keyboard and click on Shortcuts tab. Click on Custom Shortcuts and then click on + sign to define your key combination. In the command box put:


bash -c 'bash ~/ '

click on disabled on the right side of the entry , you’ll get new accelerator visible then put your desired “KEYS” for the shortcut

I use ALT+N to trigger the action appending highlighted content to clipboard with a newline as a seperator.


Linux Filesystem Hierarchy Standard (FHS)

man hier :[hierarchy] gives all this info

/boot   ->  boot related files, linux kernel binary, grub config etc

/etc     -> [backronym “Editable Text Configuration”] [sysad related] all host specific system wide configuration files for applications , eg network manager , lightdm everything is here &  startup, shutdown, start, stop shell scripts for every individual programs.

    • /etc/bash.bashrc : Contains system defaults and aliases used by bash shell.
    • /etc/profile : Bash shell defaults
    • /etc/profile.d : Application script, executed after login.
  1. /etc/skel : Script that populates new user home directory. (by mistake you deleted  local configuations in ~/*.*)
  2. /etc/crontab : A shell script to run specified commands on a predefined time Interval.
  3. /etc/init.d : Service startup Scripts.
  4. /etc/resolv.conf : Domain Name Servers (DNS) being used by System.
  5. /etc/network/interfaces for static/dhcp ip address, LAN configuration
    • /etc/hosts : Information of Ip addresses and corresponding host names.
    • /etc/hosts.allow : List of hosts allowed to access services on the local machine.
    • /etc/host.deny : List of hosts denied to access services on the local machine
  6. /etc/grub.conf : grub bootloader configuration file.(/boot/grub/grub.cfg).
  7. /etc/modules: Configuration files for system modules.
  8. /etc/passwd : Contains password of system users in a shadow file, a security implementation.
  9. /etc/X11 : Configuration files of X-window System.
  10. /etc/fstab : Information of Disk Drive and their mount point.  (command – blkid )
  11. /etc/mtab : Currently mounted blocks information. (??usecase)
  12. /etc/group : Information of Security Group.(usecase??)

/proc -> [Process Information] This is a pseudo filesystem(procfs) contains information about system processes/info,kernel/module exported variable values, for running process  /proc/{pid}

  1. /proc/cpuinfo : CPU Information
  2. /proc/filesystems : File-system Information being used currently.
  3. /proc/interrupts : Information about the current interrupts being utilised currently.
  4. /proc/ioports : Contains all the Input/Output addresses used by devices on the server.
  5. /proc/meminfo : Memory Usages Information.
  6. /proc/modules : Currently using kernel module.
  7. /proc/mount : Mounted File-system Information.
  8. /proc/stat : Detailed Statistics of the current System.
  9. /proc/swaps : Swap File Information.
  10. /proc/kallsyms : all kernel symbols that have been loaded
  11. /proc/version : os , kernel version info

/var –>  [Variable Files] Content of the files that are expected to grow can be found under this directory.

  1. system log files (/var/log/*)[sysad related]
  2. packages and database files (/var/lib);
  3. emails (/var/mail);
  4. print queues (/var/spool);
  5. lock files (/var/lock);
  6. temp files needed across reboots (/var/tmp);
  7. webserver files (/var/www)
  8. cache file (/var/cache)

/dev –> [Device Files] Essential device files that are needed for the system.

special file that appears in filesystem. It is an interface for the device driver.They allow software to interact with a device driver using standard input/output system calls, which simplifies many tasks and unifies user-space I/O mechanisms.

2 types of devices : char devices and block devices

terminal devices, usb etc. for eg: /dev/tty1, /dev/usbmon0

  1. stdin -> /proc/self/fd/0
  2. stdout -> /proc/self/fd/1
  3. stderr -> /proc/self/fd/2

psuedo devices

  1. /dev/null – accepts and discards all input; produces no output (always returns an end-of-file indication on a read) (use this as a sink)
  2. /dev/zero – accepts and discards all input; produces a continuous stream of NUL (zero value) bytes (use this as source)
  3. /dev/full – produces a continuous stream of NUL (zero value) bytes when read, and returns a “disk full” message when written to
  4. /dev/random and /dev/urandom – they produce a variable-length stream of pseudo-random numbers.

/sbin       ->  [system binaries] essential os system binaries      e.g., fsck, init, route, insmod, rmmod, modinfo, modprobe

/bin          ->  [user binaries] command line binaries for single user mode?? like cat, cp, mkdir, su (permissions rwx, s-> secure execute for root user)

/lib*      ->  [system libraries] for /bin/ & /sbin/  binaries(system wide share/usr/bin?)

/usr    ->   Secondary hierarchy for read-only user data; contains the majority of (multi-)user utilities(libraries, docs) and applications[apt-get installs packages to /usr/bin and /usr/lib]

  • /sbin -> non essential system binaries for sysads.For example: cron, sshd, useradd,
  • /bin -> non essential command binaries for all users.For example: at, awk, cc, less, scp
  • /lib* ->  Libraries for the binaries in /usr/bin/ and /usr/sbin/.(above)
  • /include -> Standard include files for c,c++(gcc) programs etc.
  • /src -> kernel source code with header files
  • /share ->Architecture-independent (shared) data. man files, info files etc
  • /games -> some games
  • /local ->Tertiary hierarchy for local data, specific to this host.users programs that you install from source.Typically has further subdirectories, e.g., bin/, lib/, share/
    • /bin ->(compiled binaries) openssl, etc
    • /lib -> (shared libraries for binaries in /usr/local/bin) , libopencv
    • /include -> header files for the apps/pkg installed in /usr/local
    • /etc -> configuration file for apps in /usr/local/bin
    • /share -> man, info pages for apps/pkgs in /usr/local/

/media    ->  [Removable media] Temporary mount directory for removable

/mnt         ->  [mount point] Temperoray mount directory, sysads can mount different filesystems  here.

/sys  ->  [system information] Sysfs is mounted under the /sys mount point.The sys filesystem is the location where information about devices, drivers, and some kernel features is exposed.

/tmp  ->  [temporary files] deleted on reboot. where as /var/tmp are preserved during reboot.

/opt -> [optinal ] Optional independent (stand alone) application software packages. (just like in windows c:/programfiles/) eg:/opt/google/

/run -> [Runtime info] Information about the running system since last boot, e.g., currently logged-in users and running daemons.

/srv-> [service] Site-specific data served by this system, such as data and scripts for web servers, data offered by FTP servers, and repositories for version control systems.usecase??

/root    ->  [root:~] root user’s home directory

/home -> [jaggi:~] Home directories for all users to store their personal files and personal settings eg: /home/jaggi

References – (pics, some text is directly copied from these sites)



Set up qtcreator for kernel module developement (kernel programming)


Why use qtcreator?
qtcreator is one of the best ide for c/c++ developement. It has amazing autocompletion features. It makes code walktrough of large projects (like linux kernel code) easier.

1.Import existing makefile project into Qt Creator
If you are using your own build system, you create a project by selecting “Import of Makefile-based project”. This will create some files in your project directory including a file named <project_name>.includes  . In that file, simply list the paths you want to include, one per line. Adding “include paths” to qtcreator for kernel projects  enables autocompletiton for kernel related sources (module.h ) on Qt creator.:

All this really does is – tell Qt Creator where to look for files to index for auto completion. Your own build system will have to handle the include paths in its own way.

for eg. for project ‘dummy’ , the include file would be ‘dummy.includes’ which should contain path to kernel source




2. Makefile and build(compile)

Example Makefile (for module file jaggi-klm.c file)

obj-m += jaggi-klm.o 


        make -C /lib/modules/$(shell uname -r)/build M=$(PWD)/dumkp modules


          make -C /lib/modules/$(shell uname -r)/build M=$(PWD)/dumkp clean

builds module file with extension .ko (kernel object)


3. Then do load/unload the module using:

sudo insmod  _____.ko
sudo rmmod   _____.ko




iitb email on gmail/thunderbird

On Gmail App

CSE mail

There is a typo in smtp part, instead of imap it should be smtp

GPO mail

outgoing smtp starttls(all certificates accepted)


On thunderbird

CSE mail (same as above)

There is a typo in smtp part, instead of imap it should be smtp


mail not sending? see

for setting logging  – see Linux/unix section of

GPO mail

incoming imap – ssl/tls port 143- plain password



— gmail on thunderbird!topic/gmail/3jO9r667zRg

most probable cause iitb proxy/firewall not allowing mail ports to google