generateRDF
- generateRDF.GenerateRDF(path)
The drived function of this script, it will call the other functions to clean and fetch new articles, concatnate them into a bigger DF and generate the KG
- Parameters:
path (str) – directory to save to
- generateRDF.clean_author_names(names)
Cleans the author names provided by both the news API and in the dataset. Will split authors into a list. Will also handle missing names
- generateRDF.clean_body_text(text: str) str
Cleans the article body text by removing special characters and converting to lower case.
- generateRDF.clean_outlet_names(name: str) str
Cleans the outlet names provided by both the news API and in the dataset by removing special characters and converting to lower case. :param name: The outlet name to be cleaned :type name: str
- Returns:
cleaned outlet name
- Return type:
- generateRDF.fetch_news(save_path) DataFrame
Executes a GET request to fetch daily news articles that must be served to the user. You will need to provide your own API from newsdata.io
- Parameters:
save_path (str) – directory to save the feteched articles
- Returns:
DataFrame of fetched articles
- Return type:
pandas.DataFrame
- generateRDF.generate_graph(df_final: DataFrame, file_name: str, save_path: str, write_turtle=False)
Takes in a dataframe of articles and generates the KG according to our ontology definition. :param df_final: articles dataframe :type df_final: pd.DataFrame :type save_path: str
- Returns:
DataFrame of fetched articles
- Return type:
pandas.DataFrame