How to contribute
I am always happy about pull requests.
If something is missing, like attributes for an object feel free to either add it yourself or open an issue, if you choose to just change it, it might be changed back in the review process.
So here are some things you can do to help out:
- implement a new page like e.g. SoundCloud
- look at open issues or look at any of the [project board]s(https://gitea.elara.ws/music-kraken/music-kraken-core/projects) for open tasks.
Before you start, please make sure to at least read the core concepts to prevent misunderstandings.
Add a new Page
Every scraping code is encapsulated in a child class of the Page
class.
You need to implement the following functions:
Search and url functions
get_source_type
This function should return the type of the object type, the source was passed into.general_search
This function returns a list of data objects, found with a simple query. Only implement the following functions if the page allows advanced search:song_search
returns a list ofSong
objects.album_search
returns a list ofAlbum
objects.artist_search
returns a list ofArtist
objects.label_search
returns a list ofLabel
objects.
Fetch functions
These functions all take a Source
object as input and return a more detailed object. It gets automatically merged with the other existing objects.
fetch_song
fetch_album
fetch_artist
fetch_label
If the page that is scraped does not have for example labels, just don't implement the functions for Label
.
downloading
The function to actually download something is download_song_to_target
. Its arguments are source
, target
and desc
. If you want to remove certain intervals from the song, you can return a list of the intervals in the function get_skip_intervals
.
Step by step guide
- Just create a new file with the name
your_page.py
in the page module. - Then you can simply copy the contents of the preset over to your file.
- All the functions you need to implement, can be found in the preset.
Important notes
- There is no need to check if you for example added a source of a song twice. I do much post-processing to the data you scrape in the page classes. You can see what exactly I do in abstract.py.
- Use the connection class how it is laid out in the preset to make the request. This will take care of retrying requests, rotating proxies, consistent use of tor (if selected in the config). You have:
connection.get()
connection.post()
To get help, or to let me know on what you are working, just write in the development room of our matrix space what you are currently doing, and if you need help. I will try to help you as soon as possible.
Click this invitation (https://matrix.to/#/#music-kraken:matrix.org) to join our matrix room.