generateRDF

generateRDF.GenerateRDF(path)

The drived function of this script, it will call the other functions to clean and fetch new articles, concatnate them into a bigger DF and generate the KG

Parameters:

path (str) – directory to save to

generateRDF.clean_author_names(names)

Cleans the author names provided by both the news API and in the dataset. Will split authors into a list. Will also handle missing names

Parameters:

names (str) – The author names to be cleaned

Returns:

list of cleaned author names

Return type:

list

generateRDF.clean_body_text(text: str) str

Cleans the article body text by removing special characters and converting to lower case.

Parameters:

text (str) – the article body text

Returns:

cleaned article text

Return type:

str

generateRDF.clean_outlet_names(name: str) str

Cleans the outlet names provided by both the news API and in the dataset by removing special characters and converting to lower case. :param name: The outlet name to be cleaned :type name: str

Returns:

cleaned outlet name

Return type:

str

generateRDF.fetch_news(save_path) DataFrame

Executes a GET request to fetch daily news articles that must be served to the user. You will need to provide your own API from newsdata.io

Parameters:

save_path (str) – directory to save the feteched articles

Returns:

DataFrame of fetched articles

Return type:

pandas.DataFrame

generateRDF.generate_graph(df_final: DataFrame, file_name: str, save_path: str, write_turtle=False)

Takes in a dataframe of articles and generates the KG according to our ontology definition. :param df_final: articles dataframe :type df_final: pd.DataFrame :type save_path: str

Returns:

DataFrame of fetched articles

Return type:

pandas.DataFrame